HOME

TheInfoList



OR:

200px, Josiah Willard Gibbs In
information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
, Gibbs' inequality is a statement about the
information entropy In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
of a discrete
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
. Several other bounds on the entropy of probability distributions are derived from Gibbs' inequality, including Fano's inequality. It was first presented by
J. Willard Gibbs Josiah Willard Gibbs (; February 11, 1839 – April 28, 1903) was an American scientist who made significant theoretical contributions to physics, chemistry, and mathematics. His work on the applications of thermodynamics was instrumental in ...
in the 19th century.


Gibbs' inequality

Suppose that : P = \ is a discrete
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
. Then for any other probability distribution : Q = \ the following inequality between positive quantities (since pi and qi are between zero and one) holds: : - \sum_^n p_i \log p_i \leq - \sum_^n p_i \log q_i with equality if and only if : p_i = q_i for all ''i''. Put in words, the
information entropy In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
of a distribution P is less than or equal to its cross entropy with any other distribution Q. The difference between the two quantities is the
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...
or relative entropy, so the inequality can also be written: : D_(P\, Q) \equiv \sum_^n p_i \log \frac \geq 0. Note that the use of base-2
logarithm In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a number  to the base  is the exponent to which must be raised, to produce . For example, since , the ''logarithm base'' 10 o ...
s is optional, and allows one to refer to the quantity on each side of the inequality as an "average surprisal" measured in
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented ...
s.


Proof

For simplicity, we prove the statement using the natural logarithm (ln), since : \log a = \frac, the particular logarithm we choose only scales the relationship. Let I denote the set of all i for which ''pi'' is non-zero. Then, since \ln x \leq x-1 for all ''x > 0'', with equality if and only if ''x=1'', we have: :- \sum_ p_i \ln \frac \geq - \sum_ p_i \left( \frac - 1 \right) = - \sum_ q_i + \sum_ p_i = - \sum_ q_i + 1 \geq 0 The last inequality is a consequence of the ''pi'' and ''qi'' being part of a probability distribution. Specifically, the sum of all non-zero values is 1. Some non-zero ''qi'', however, may have been excluded since the choice of indices is conditioned upon the ''pi'' being non-zero. Therefore the sum of the ''qi'' may be less than 1. So far, over the index set I, we have: : - \sum_ p_i \ln \frac \geq 0 , or equivalently : - \sum_ p_i \ln q_i \geq - \sum_ p_i \ln p_i . Both sums can be extended to all i=1, \ldots, n, i.e. including p_i=0, by recalling that the expression p \ln p tends to 0 as p tends to 0, and (-\ln q) tends to \infty as q tends to 0. We arrive at : - \sum_^n p_i \ln q_i \geq - \sum_^n p_i \ln p_i For equality to hold, we require # \frac = 1 for all i \in I so that the equality \ln \frac = \frac -1 holds, # and \sum_ q_i = 1 which means q_i=0 if i\notin I, that is, q_i=0 if p_i=0. This can happen if and only if p_i = q_i for i = 1, \ldots, n.


Alternative proofs

The result can alternatively be proved using
Jensen's inequality In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier ...
, the log sum inequality, or the fact that the Kullback-Leibler divergence is a form of
Bregman divergence In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function; they form an important class of divergences. ...
. Below we give a proof based on Jensen's inequality: Because log is a concave function, we have that: :\sum_i p_i \log\frac \le \log\sum_i p_i\frac = \log\sum_i q_i \le 0 Where the first inequality is due to Jensen's inequality, and the last equality is due to the same reason given in the above proof. Furthermore, since \log is strictly concave, by the equality condition of Jensen's inequality we get equality when :\frac = \frac = \cdots = \frac and :\sum_i q_i = 1 Suppose that this ratio is \sigma, then we have that :1 = \sum_i q_i = \sum_i \sigma p_i = \sigma Where we use the fact that p, q are probability distributions. Therefore the equality happens when p = q.


Corollary

The
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
of P is bounded by: :H(p_1, \ldots , p_n) \leq \log n. The proof is trivial – simply set q_i = 1/n for all ''i''.


See also

*
Information entropy In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
*
Bregman divergence In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function; they form an important class of divergences. ...
* Log sum inequality


References

{{Reflist Information theory Coding theory Inequalities Articles containing proofs