In
mathematics
Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
and
information theory
Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
, Sanov's theorem gives a bound on the probability of observing an
atypical sequence of samples from a given
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
. In the language of
large deviations theory, Sanov's theorem identifies the
rate function for large deviations of the
empirical measure of a sequence of
i.i.d. random variables.
Let ''A'' be a set of probability distributions over an alphabet ''X'', and let ''q'' be an arbitrary distribution over ''X'' (where ''q'' may or may not be in ''A''). Suppose we draw ''n'' i.i.d. samples from ''q'', represented by the vector
. Then, we have the following bound on the probability that the empirical measure
of the samples falls within the set ''A'':
:
,
where
*
is the joint probability distribution on
, and
*
is the
information projection In information theory, the information projection or I-projection of a probability distribution ''q'' onto a set of distributions ''P'' is
:p^* = \underset \operatorname_(p, , q).
where D_ is the Kullback–Leibler divergence from ''q'' to ''p'' ...
of ''q'' onto ''A''.
In words, the probability of drawing an atypical distribution is bounded by a function of the
KL divergence from the true distribution to the atypical one; in the case that we consider a set of possible atypical distributions, there is a dominant atypical distribution, given by the information projection.
Furthermore, if ''A'' is a closed set, then
:
Technical statement
Define:
*
is a finite set with size
. Understood as “alphabet”.
*
is the simplex spanned by the alphabet. It is a subset of
.
*
is a random variable taking values in
. Take
samples from the distribution
, then
is the frequency probability vector for the sample.
*
is the space of values that
can take. In other words, it is
Then, Sanov's theorem states:
* For every measurable subset
,
* For every open subset
,
Here,
means the
interior
Interior may refer to:
Arts and media
* ''Interior'' (Degas) (also known as ''The Rape''), painting by Edgar Degas
* ''Interior'' (play), 1895 play by Belgian playwright Maurice Maeterlinck
* ''The Interior'' (novel), by Lisa See
* Interior de ...
, and
means the
closure.
References
*
*Sanov, I. N. (1957) "On the probability of large deviations of random variables". ''Mat. Sbornik'' 42(84), No. 1, 11–44.
*Санов, И. Н. (1957) "О вероятности больших отклонений случайных величин". ''МАТЕМАТИЧЕСКИЙ СБОРНИК' 42(84), No. 1, 11–44.
Information theory
Probabilistic inequalities
{{probability-stub