TheInfoList

OR:

200px, Josiah Willard Gibbs In
information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
, Gibbs' inequality is a statement about the
information entropy In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
of a discrete
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
. Several other bounds on the entropy of probability distributions are derived from Gibbs' inequality, including Fano's inequality. It was first presented by J. Willard Gibbs in the 19th century.

# Gibbs' inequality

Suppose that :$P = \$ is a discrete
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
. Then for any other probability distribution :$Q = \$ the following inequality between positive quantities (since pi and qi are between zero and one) holds: :$- \sum_^n p_i \log p_i \leq - \sum_^n p_i \log q_i$ with equality if and only if :$p_i = q_i$ for all ''i''. Put in words, the
information entropy In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
of a distribution P is less than or equal to its cross entropy with any other distribution Q. The difference between the two quantities is the Kullback–Leibler divergence or relative entropy, so the inequality can also be written: :$D_\left(P\, Q\right) \equiv \sum_^n p_i \log \frac \geq 0.$ Note that the use of base-2
logarithm In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a number  to the base  is the exponent to which must be raised, to produce . For example, since , the ''logarithm base'' 10 ...
s is optional, and allows one to refer to the quantity on each side of the inequality as an "average surprisal" measured in
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
s.

# Proof

For simplicity, we prove the statement using the natural logarithm (ln), since :$\log a = \frac,$ the particular logarithm we choose only scales the relationship. Let $I$ denote the set of all $i$ for which ''pi'' is non-zero. Then, since $\ln x \leq x-1$ for all ''x > 0'', with equality if and only if ''x=1'', we have: :$- \sum_ p_i \ln \frac \geq - \sum_ p_i \left\left( \frac - 1 \right\right)$$= - \sum_ q_i + \sum_ p_i = - \sum_ q_i + 1 \geq 0$ The last inequality is a consequence of the ''pi'' and ''qi'' being part of a probability distribution. Specifically, the sum of all non-zero values is 1. Some non-zero ''qi'', however, may have been excluded since the choice of indices is conditioned upon the ''pi'' being non-zero. Therefore the sum of the ''qi'' may be less than 1. So far, over the index set $I$, we have: :$- \sum_ p_i \ln \frac \geq 0$, or equivalently :$- \sum_ p_i \ln q_i \geq - \sum_ p_i \ln p_i$. Both sums can be extended to all $i=1, \ldots, n$, i.e. including $p_i=0$, by recalling that the expression $p \ln p$ tends to 0 as $p$ tends to 0, and $\left(-\ln q\right)$ tends to $\infty$ as $q$ tends to 0. We arrive at :$- \sum_^n p_i \ln q_i \geq - \sum_^n p_i \ln p_i$ For equality to hold, we require # $\frac = 1$ for all $i \in I$ so that the equality $\ln \frac = \frac -1$ holds, # and $\sum_ q_i = 1$ which means $q_i=0$ if $i\notin I$, that is, $q_i=0$ if $p_i=0$. This can happen if and only if $p_i = q_i$ for $i = 1, \ldots, n$.

# Alternative proofs

The result can alternatively be proved using Jensen's inequality, the log sum inequality, or the fact that the Kullback-Leibler divergence is a form of Bregman divergence. Below we give a proof based on Jensen's inequality: Because log is a concave function, we have that: :$\sum_i p_i \log\frac \le \log\sum_i p_i\frac = \log\sum_i q_i \le 0$ Where the first inequality is due to Jensen's inequality, and the last equality is due to the same reason given in the above proof. Furthermore, since $\log$ is strictly concave, by the equality condition of Jensen's inequality we get equality when :$\frac = \frac = \cdots = \frac$ and :$\sum_i q_i = 1$ Suppose that this ratio is $\sigma$, then we have that :$1 = \sum_i q_i = \sum_i \sigma p_i = \sigma$ Where we use the fact that $p, q$ are probability distributions. Therefore the equality happens when $p = q$.

# Corollary

The
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
of $P$ is bounded by: :$H\left(p_1, \ldots , p_n\right) \leq \log n.$ The proof is trivial – simply set $q_i = 1/n$ for all ''i''.