200px, Josiah Willard Gibbs
In

_{i} and q_{i} are between zero and one) holds:
:$-\; \backslash sum\_^n\; p\_i\; \backslash log\; p\_i\; \backslash leq\; -\; \backslash sum\_^n\; p\_i\; \backslash log\; q\_i$
with equality if and only if
:$p\_i\; =\; q\_i$
for all ''i''. Put in words, the

_{i}'' is non-zero. Then, since $\backslash ln\; x\; \backslash leq\; x-1$ for all ''x > 0'', with equality if and only if ''x=1'', we have:
:$-\; \backslash sum\_\; p\_i\; \backslash ln\; \backslash frac\; \backslash geq\; -\; \backslash sum\_\; p\_i\; \backslash left(\; \backslash frac\; -\; 1\; \backslash right)$$=\; -\; \backslash sum\_\; q\_i\; +\; \backslash sum\_\; p\_i\; =\; -\; \backslash sum\_\; q\_i\; +\; 1\; \backslash geq\; 0$
The last inequality is a consequence of the ''p_{i}'' and ''q_{i}'' being part of a probability distribution. Specifically, the sum of all non-zero values is 1. Some non-zero ''q_{i}'', however, may have been excluded since the choice of indices is conditioned upon the ''p_{i}'' being non-zero. Therefore the sum of the ''q_{i}'' may be less than 1.
So far, over the index set $I$, we have:
:$-\; \backslash sum\_\; p\_i\; \backslash ln\; \backslash frac\; \backslash geq\; 0$,
or equivalently
:$-\; \backslash sum\_\; p\_i\; \backslash ln\; q\_i\; \backslash geq\; -\; \backslash sum\_\; p\_i\; \backslash ln\; p\_i$.
Both sums can be extended to all $i=1,\; \backslash ldots,\; n$, i.e. including $p\_i=0$, by recalling that the expression $p\; \backslash ln\; p$ tends to 0 as $p$ tends to 0, and $(-\backslash ln\; q)$ tends to $\backslash infty$ as $q$ tends to 0. We arrive at
:$-\; \backslash sum\_^n\; p\_i\; \backslash ln\; q\_i\; \backslash geq\; -\; \backslash sum\_^n\; p\_i\; \backslash ln\; p\_i$
For equality to hold, we require
# $\backslash frac\; =\; 1$ for all $i\; \backslash in\; I$ so that the equality $\backslash ln\; \backslash frac\; =\; \backslash frac\; -1$ holds,
# and $\backslash sum\_\; q\_i\; =\; 1$ which means $q\_i=0$ if $i\backslash notin\; I$, that is, $q\_i=0$ if $p\_i=0$.
This can happen if and only if $p\_i\; =\; q\_i$ for $i\; =\; 1,\; \backslash ldots,\; n$.

information theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...

, Gibbs' inequality is a statement about the information entropy
In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...

of a discrete probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...

. Several other bounds on the entropy of probability distributions are derived from Gibbs' inequality, including Fano's inequality.
It was first presented by J. Willard Gibbs in the 19th century.
Gibbs' inequality

Suppose that :$P\; =\; \backslash $ is a discreteprobability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...

. Then for any other probability distribution
:$Q\; =\; \backslash $
the following inequality between positive quantities (since pinformation entropy
In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...

of a distribution P is less than or equal to its cross entropy with any other distribution Q.
The difference between the two quantities is the Kullback–Leibler divergence or relative entropy, so the inequality can also be written:
:$D\_(P\backslash ,\; Q)\; \backslash equiv\; \backslash sum\_^n\; p\_i\; \backslash log\; \backslash frac\; \backslash geq\; 0.$
Note that the use of base-2 logarithm
In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a number to the base is the exponent to which must be raised, to produce . For example, since , the ''logarithm base'' 10 ...

s is optional, and
allows one to refer to the quantity on each side of the inequality as an
"average surprisal" measured in bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...

s.
Proof

For simplicity, we prove the statement using the natural logarithm (ln), since :$\backslash log\; a\; =\; \backslash frac,$ the particular logarithm we choose only scales the relationship. Let $I$ denote the set of all $i$ for which ''pAlternative proofs

The result can alternatively be proved using Jensen's inequality, the log sum inequality, or the fact that the Kullback-Leibler divergence is a form of Bregman divergence. Below we give a proof based on Jensen's inequality: Because log is a concave function, we have that: :$\backslash sum\_i\; p\_i\; \backslash log\backslash frac\; \backslash le\; \backslash log\backslash sum\_i\; p\_i\backslash frac\; =\; \backslash log\backslash sum\_i\; q\_i\; \backslash le\; 0$ Where the first inequality is due to Jensen's inequality, and the last equality is due to the same reason given in the above proof. Furthermore, since $\backslash log$ is strictly concave, by the equality condition of Jensen's inequality we get equality when :$\backslash frac\; =\; \backslash frac\; =\; \backslash cdots\; =\; \backslash frac$ and :$\backslash sum\_i\; q\_i\; =\; 1$ Suppose that this ratio is $\backslash sigma$, then we have that :$1\; =\; \backslash sum\_i\; q\_i\; =\; \backslash sum\_i\; \backslash sigma\; p\_i\; =\; \backslash sigma$ Where we use the fact that $p,\; q$ are probability distributions. Therefore the equality happens when $p\; =\; q$.Corollary

Theentropy
Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...

of $P$ is bounded by:
:$H(p\_1,\; \backslash ldots\; ,\; p\_n)\; \backslash leq\; \backslash log\; n.$
The proof is trivial – simply set $q\_i\; =\; 1/n$ for all ''i''.
See also

*Information entropy
In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...

* Bregman divergence
* Log sum inequality
References

{{Reflist Information theory Coding theory Inequalities Articles containing proofs