200px, Josiah Willard Gibbs
In
information theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
, Gibbs' inequality is a statement about the
information entropy
In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
of a discrete
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
. Several other bounds on the entropy of probability distributions are derived from Gibbs' inequality, including
Fano's inequality.
It was first presented by
J. Willard Gibbs in the 19th century.
Gibbs' inequality
Suppose that
:
is a discrete
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
. Then for any other probability distribution
:
the following inequality between positive quantities (since p
i and q
i are between zero and one) holds:
:
with equality if and only if
:
for all ''i''. Put in words, the
information entropy
In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
of a distribution P is less than or equal to its
cross entropy with any other distribution Q.
The difference between the two quantities is the
Kullback–Leibler divergence or relative entropy, so the inequality can also be written:
:
Note that the use of base-2
logarithm
In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a number to the base is the exponent to which must be raised, to produce . For example, since , the ''logarithm base'' 10 ...
s is optional, and
allows one to refer to the quantity on each side of the inequality as an
"average
surprisal" measured in
bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
s.
Proof
For simplicity, we prove the statement using the natural logarithm (ln), since
:
the particular logarithm we choose only scales the relationship.
Let
denote the set of all
for which ''p
i'' is non-zero. Then, since
for all ''x > 0'', with equality if and only if ''x=1'', we have:
:
The last inequality is a consequence of the ''p
i'' and ''q
i'' being part of a probability distribution. Specifically, the sum of all non-zero values is 1. Some non-zero ''q
i'', however, may have been excluded since the choice of indices is conditioned upon the ''p
i'' being non-zero. Therefore the sum of the ''q
i'' may be less than 1.
So far, over the index set
, we have:
:
,
or equivalently
:
.
Both sums can be extended to all
, i.e. including
, by recalling that the expression
tends to 0 as
tends to 0, and
tends to
as
tends to 0. We arrive at
:
For equality to hold, we require
#
for all
so that the equality
holds,
# and
which means
if
, that is,
if
.
This can happen if and only if
for
.
Alternative proofs
The result can alternatively be proved using
Jensen's inequality, the
log sum inequality, or the fact that the Kullback-Leibler divergence is a form of
Bregman divergence. Below we give a proof based on Jensen's inequality:
Because log is a concave function, we have that:
:
Where the first inequality is due to Jensen's inequality, and the last equality is due to the same reason given in the above proof.
Furthermore, since
is strictly concave, by the equality condition of Jensen's inequality we get equality when
:
and
:
Suppose that this ratio is
, then we have that
:
Where we use the fact that
are probability distributions. Therefore the equality happens when
.
Corollary
The
entropy
Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
of
is bounded by:
:
The proof is trivial – simply set
for all ''i''.
See also
*
Information entropy
In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable X, which takes values in the alphabet \ ...
*
Bregman divergence
*
Log sum inequality
References
{{Reflist
Information theory
Coding theory
Inequalities
Articles containing proofs