200px, Josiah Willard Gibbs
In
information theory
Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...
, Gibbs' inequality is a statement about the
information entropy
In information theory, the entropy of a random variable quantifies the average level of uncertainty or information associated with the variable's potential states or possible outcomes. This measures the expected amount of information needed ...
of a discrete
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
. Several other bounds on the entropy of probability distributions are derived from Gibbs' inequality, including
Fano's inequality.
It was first presented by
J. Willard Gibbs in the 19th century.
Gibbs' inequality
Suppose that
and
are discrete
probability distributions
In probability theory and statistics, a probability distribution is a function that gives the probabilities of occurrence of possible events for an experiment. It is a mathematical description of a random phenomenon in terms of its sample spac ...
. Then
:
with equality if and only if
for
.
Put in words, the
information entropy
In information theory, the entropy of a random variable quantifies the average level of uncertainty or information associated with the variable's potential states or possible outcomes. This measures the expected amount of information needed ...
of a distribution
is less than or equal to its
cross entropy with any other distribution
.
The difference between the two quantities is the
Kullback–Leibler divergence
In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how much a model probability distribution is diff ...
or relative entropy, so the inequality can also be written:
:
Note that the use of base-2
logarithm
In mathematics, the logarithm of a number is the exponent by which another fixed value, the base, must be raised to produce that number. For example, the logarithm of to base is , because is to the rd power: . More generally, if , the ...
s is optional, and
allows one to refer to the quantity on each side of the inequality as an
"average
surprisal" measured in
bits.
Proof
For simplicity, we prove the statement using the natural logarithm, denoted by , since
:
so the particular logarithm base that we choose only scales the relationship by the factor .
Let
denote the set of all
for which ''p
i'' is non-zero. Then, since
for all ''x > 0'', with equality if and only if ''x=1'', we have:
:
The last inequality is a consequence of the ''p
i'' and ''q
i'' being part of a probability distribution. Specifically, the sum of all non-zero values is 1. Some non-zero ''q
i'', however, may have been excluded since the choice of indices is conditioned upon the ''p
i'' being non-zero. Therefore, the sum of the ''q
i'' may be less than 1.
So far, over the index set
, we have:
:
,
or equivalently
:
.
Both sums can be extended to all
, i.e. including
, by recalling that the expression
tends to 0 as
tends to 0, and
tends to
as
tends to 0. We arrive at
:
For equality to hold, we require
#
for all
so that the equality
holds,
# and
which means
if
, that is,
if
.
This can happen if and only if
for
.
Alternative proofs
The result can alternatively be proved using
Jensen's inequality, the
log sum inequality, or the fact that the Kullback-Leibler divergence is a form of
Bregman divergence.
Proof by Jensen's inequality
Because log is a concave function, we have that:
:
where the first inequality is due to Jensen's inequality, and
being a probability distribution implies the last equality.
Furthermore, since
is strictly concave, by the equality condition of Jensen's inequality we get equality when
:
and
:
.
Suppose that this ratio is
, then we have that
:
where we use the fact that
are probability distributions. Therefore, the equality happens when
.
Proof by Bregman divergence
Alternatively, it can be proved by noting that
for all
, with equality holding iff
. Then, sum over the states, we have
with equality holding iff
.
This is because the KL divergence is the
Bregman divergence generated by the function
.
Corollary
The
entropy
Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...
of
is bounded by:
:
The proof is trivial – simply set
for all ''i''.
See also
*
Information entropy
In information theory, the entropy of a random variable quantifies the average level of uncertainty or information associated with the variable's potential states or possible outcomes. This measures the expected amount of information needed ...
*
Bregman divergence
*
Log sum inequality
References
{{Reflist
Information theory
Coding theory
Probabilistic inequalities
Articles containing proofs