In
information theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
, the information projection or I-projection of a
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
''q'' onto a set of distributions ''P'' is
:
.
where
is the
Kullback–Leibler divergence
In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fro ...
from ''q'' to ''p''. Viewing the Kullback–Leibler divergence as a measure of distance, the I-projection
is the "closest" distribution to ''q'' of all the distributions in ''P''.
The I-projection is useful in setting up
information geometry
Information geometry is an interdisciplinary field that applies the techniques of differential geometry to study probability theory and statistics. It studies statistical manifolds, which are Riemannian manifolds whose points correspond to p ...
, notably because of the following inequality, valid when ''P'' is convex:
.
This inequality can be interpreted as an information-geometric version of Pythagoras' triangle-inequality theorem, where KL divergence is viewed as squared distance in a Euclidean space.
It is worthwhile to note that since
and continuous in p,
if ''P'' is closed and non-empty, then there exists at least one minimizer to the optimization problem framed above. Furthermore, if ''P'' is convex, then the optimum distribution is unique.
The reverse I-projection also known as moment projection or M-projection is
:
.
Since the KL divergence is not symmetric in its arguments, the I-projection and the M-projection will exhibit different behavior. For I-projection,
will typically
under-estimate the support of
and will lock onto one of its modes. This is due to
, whenever
to make sure KL divergence stays finite. For M-projection,
will typically over-estimate the support of
. This is due to
whenever
to make sure KL divergence stays finite.
The reverse I-projection plays a fundamental role in the construction of optimal
e-variables.
The concept of information projection can be extended to arbitrary
''f''-divergences and other
divergence
In vector calculus, divergence is a vector operator that operates on a vector field, producing a scalar field giving the quantity of the vector field's source at each point. More technically, the divergence represents the volume density of t ...
s.
See also
*
Sanov's theorem
In mathematics and information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution. In the language of large deviations theory, Sanov's theorem identifies th ...
References
*K. Murphy, "Machine Learning: a Probabilistic Perspective", The MIT Press, 2012.
Information theory
{{probability-stub