Sanov's theorem
   HOME

TheInfoList



OR:

In
mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
and
information theory Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...
, Sanov's theorem gives a bound on the probability of observing an
atypical ''Atypical'' is an American comedy-drama television series created by Robia Rashid for Netflix. The series takes place in Connecticut, and focuses on the life of 18-year-old Samuel "Sam" Gardner ( Keir Gilchrist), who is autistic. The first sea ...
sequence of samples from a given
probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
. In the language of
large deviations theory In probability theory, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions. While some basic ideas of the theory can be traced to Laplace, the formalization started with insura ...
, Sanov's theorem identifies the rate function for large deviations of the
empirical measure In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical sta ...
of a sequence of i.i.d. random variables. Let ''A'' be a set of probability distributions over an alphabet ''X'', and let ''q'' be an arbitrary distribution over ''X'' (where ''q'' may or may not be in ''A''). Suppose we draw ''n'' i.i.d. samples from ''q'', represented by the vector x^n = (x_1, x_2, \ldots, x_n). Then, we have the following bound on the probability that the empirical measure \hat_ of the samples falls within the set ''A'': :q^n(\hat_\in A) \le (n+1)^ 2^, where * q^n is the joint probability distribution on X^n, and * p^* is the information projection of ''q'' onto ''A''. * D_(P \, Q), the
KL divergence KL, kL, kl, or kl. may refer to: Businesses and organizations * KLM, a Dutch airline (IATA airline designator KL) * Koninklijke Landmacht, the Royal Netherlands Army * Kvenna Listin ("Women's List"), a political party in Iceland * KL FM, a Ma ...
, is given by: D_(P \, Q) = \sum_ P(x) \log \frac. In words, the probability of drawing an atypical distribution is bounded by a function of the
KL divergence KL, kL, kl, or kl. may refer to: Businesses and organizations * KLM, a Dutch airline (IATA airline designator KL) * Koninklijke Landmacht, the Royal Netherlands Army * Kvenna Listin ("Women's List"), a political party in Iceland * KL FM, a Ma ...
from the true distribution to the atypical one; in the case that we consider a set of possible atypical distributions, there is a dominant atypical distribution, given by the information projection. Furthermore, if ''A'' is a closed set, then :\lim_\frac\log q^n(\hat_\in A) = -D_(p^*, , q).


Technical statement

Define: * \Sigma is a finite set with size \geq 2. Understood as “alphabet”. * \Delta(\Sigma) is the simplex spanned by the alphabet. It is a subset of \R^\Sigma. * L_n is a random variable taking values in \Delta(\Sigma). Take n samples from the distribution \mu, then L_n is the frequency probability vector for the sample. * \mathcal L_n is the space of values that L_n can take. In other words, it is \Then, Sanov's theorem states: * For every measurable subset S \in \Delta(\Sigma),-\inf_ D(\nu\, \mu) \leq \liminf_n \frac 1n \ln P_\mu(L_n \in S) \leq \limsup_n \frac 1n \ln P_\mu(L_n \in S) \leq -\inf_ D(\nu \, \mu) * For every open subset U \in \Delta(\Sigma),-\lim_n \lim_ D(\nu\, \mu) = \lim_n \frac 1n \ln P_\mu(L_n \in S) = -\inf_ D(\nu\, \mu) Here, int(S) means the interior, and cl(S) means the closure.


References

* *Sanov, I. N. (1957) "On the probability of large deviations of random variables". ''Mat. Sbornik'' 42(84), No. 1, 11–44. *Санов, И. Н. (1957) "О вероятности больших отклонений случайных величин". ''МАТЕМАТИЧЕСКИЙ СБОРНИК' 42(84), No. 1, 11–44. Information theory Probabilistic inequalities {{probability-stub