HOME

TheInfoList



OR:

In
mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
and
information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
. In the language of large deviations theory, Sanov's theorem identifies the rate function for large deviations of the empirical measure of a sequence of i.i.d. random variables. Let ''A'' be a set of probability distributions over an alphabet ''X'', and let ''q'' be an arbitrary distribution over ''X'' (where ''q'' may or may not be in ''A''). Suppose we draw ''n'' i.i.d. samples from ''q'', represented by the vector x^n = x_1, x_2, \ldots, x_n. Then, we have the following bound on the probability that the empirical measure \hat_ of the samples falls within the set ''A'': :q^n(\hat_\in A) \le (n+1)^ 2^, where * q^n is the joint probability distribution on X^n, and * p^* is the
information projection In information theory, the information projection or I-projection of a probability distribution ''q'' onto a set of distributions ''P'' is :p^* = \underset \operatorname_(p, , q). where D_ is the Kullback–Leibler divergence from ''q'' to ''p'' ...
of ''q'' onto ''A''. In words, the probability of drawing an atypical distribution is bounded by a function of the KL divergence from the true distribution to the atypical one; in the case that we consider a set of possible atypical distributions, there is a dominant atypical distribution, given by the information projection. Furthermore, if ''A'' is a closed set, then :\lim_\frac\log q^n(\hat_\in A) = -D_(p^*, , q).


Technical statement

Define: * \Sigma is a finite set with size \geq 2. Understood as “alphabet”. * \Delta(\Sigma) is the simplex spanned by the alphabet. It is a subset of \R^\Sigma. * L_n is a random variable taking values in \Delta(\Sigma). Take n samples from the distribution \mu, then L_n is the frequency probability vector for the sample. * \mathcal L_n is the space of values that L_n can take. In other words, it is \Then, Sanov's theorem states: * For every measurable subset S \in \Delta(\Sigma),-\inf_ D(\nu\, \mu) \leq \liminf_n \frac 1n \ln P_\mu(L_n \in S) \leq \limsup_n \frac 1n \ln P_\mu(L_n \in S) \leq -\inf_ D(\nu \, \mu) * For every open subset U \in \Delta(\Sigma),-\lim_n \lim_ D(\nu\, \mu) = \lim_n \frac 1n \ln P_\mu(L_n \in S) = -\inf_ D(\nu\, \mu) Here, int(S) means the
interior Interior may refer to: Arts and media * ''Interior'' (Degas) (also known as ''The Rape''), painting by Edgar Degas * ''Interior'' (play), 1895 play by Belgian playwright Maurice Maeterlinck * ''The Interior'' (novel), by Lisa See * Interior de ...
, and cl(S) means the closure.


References

* *Sanov, I. N. (1957) "On the probability of large deviations of random variables". ''Mat. Sbornik'' 42(84), No. 1, 11–44. *Санов, И. Н. (1957) "О вероятности больших отклонений случайных величин". ''МАТЕМАТИЧЕСКИЙ СБОРНИК' 42(84), No. 1, 11–44. Information theory Probabilistic inequalities {{probability-stub