Large Deviations
   HOME

TheInfoList



OR:

In
probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions. While some basic ideas of the theory can be traced to
Laplace Pierre-Simon, Marquis de Laplace (; ; 23 March 1749 – 5 March 1827) was a French polymath, a scholar whose work has been instrumental in the fields of physics, astronomy, mathematics, engineering, statistics, and philosophy. He summariz ...
, the formalization started with insurance mathematics, namely
ruin theory In actuarial science and applied probability, ruin theory (sometimes risk theory or collective risk theory) uses mathematical models to describe an insurer's vulnerability to insolvency/ruin. In such models key quantities of interest are the proba ...
with Cramér and Lundberg. A unified formalization of large deviation theory was developed in 1966, in a paper by Varadhan. Large deviations theory formalizes the heuristic ideas of ''concentration of measures'' and widely generalizes the notion of convergence of probability measures. Roughly speaking, large deviations theory concerns itself with the exponential decline of the probability measures of certain kinds of extreme or ''tail'' events.


Introductory examples


An elementary example

Consider a sequence of independent tosses of a fair coin. The possible outcomes could be heads or tails. Let us denote the possible outcome of the i-th trial by where we encode head as 1 and tail as 0. Now let M_N denote the mean value after N trials, namely : Then M_N lies between 0 and 1. From the
law of large numbers In probability theory, the law of large numbers is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the law o ...
it follows that as N grows, the distribution of M_N converges to 0.5 = \operatorname /math> (the expected value of a single coin toss). Moreover, by the
central limit theorem In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...
, it follows that M_N is approximately normally distributed for large The central limit theorem can provide more detailed information about the behavior of M_N than the law of large numbers. For example, we can approximately find a tail probability of the probability that M_N is greater than some value for a fixed value of However, the approximation by the central limit theorem may not be accurate if x is far from \operatorname _i/math> and N is not sufficiently large. Also, it does not provide information about the convergence of the tail probabilities as However, the large deviation theory can provide answers for such problems. Let us make this statement more precise. For a given value let us compute the tail probability Define : Note that the function I(x) is a convex, nonnegative function that is zero at x = \tfrac and increases as x approaches It is the negative of the Bernoulli entropy with that it's appropriate for coin tosses follows from the
asymptotic equipartition property In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of data compression. Roughly speaking, the t ...
applied to a
Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...
. Then by Chernoff's inequality, it can be shown that This bound is rather sharp, in the sense that I(x) cannot be replaced with a larger number which would yield a strict inequality for all positive (However, the exponential bound can still be reduced by a subexponential factor on the order of this follows from the Stirling approximation applied to the
binomial coefficient In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...
appearing in the
Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with pro ...
.) Hence, we obtain the following result: : The probability P(M_N > x) decays exponentially as N \to \infty at a rate depending on ''x''. This formula approximates any tail probability of the sample mean of i.i.d. variables and gives its convergence as the number of samples increases.


Large deviations for sums of independent random variables

In the above example of coin-tossing we explicitly assumed that each toss is an independent trial, and the probability of getting head or tail is always the same. Let X,X_1,X_2, \ldots be
independent and identically distributed Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...
(i.i.d.) random variables whose common distribution satisfies a certain growth condition. Then the following limit exists: : Here : as before. Function I(\cdot) is called the " rate function" or "Cramér function" or sometimes the "entropy function". The above-mentioned limit means that for large : which is the basic result of large deviations theory. If we know the probability distribution of an explicit expression for the rate function can be obtained. This is given by a Legendre–Fenchel transformation, : where :\lambda(\theta) = \ln \operatorname exp(\theta X)/math> is called the
cumulant generating function In probability theory and statistics, the cumulants of a probability distribution are a set of quantities that provide an alternative to the '' moments'' of the distribution. Any two probability distributions whose moments are identical will have ...
(CGF) and \operatorname denotes the
mathematical expectation In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first moment) is a generalization of the weighted average. Informally, the expected val ...
. If X follows a
normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...
, the rate function becomes a parabola with its apex at the mean of the normal distribution. If \ is an irreducible and aperiodic
Markov chain In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally ...
, the variant of the basic large deviations result stated above may hold.


Moderate deviations for sums of independent random variables

The previous example controlled the probability of the event _N>x/math>, that is, the concentration of the law of M_N on the
compact set In mathematics, specifically general topology, compactness is a property that seeks to generalize the notion of a closed and bounded subset of Euclidean space. The idea is that a compact space has no "punctures" or "missing endpoints", i.e., i ...
x,x/math>. It is also possible to control the probability of the event _N>x a_N/math> for some sequence a_N\to 0. The following is an example of a moderate deviations principle: In particular, the limit case a_N=\sqrt is the
central limit theorem In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...
.


Formal definition

Given a
Polish space In the mathematical discipline of general topology, a Polish space is a separable space, separable Completely metrizable space, completely metrizable topological space; that is, a space homeomorphic to a Complete space, complete metric space that h ...
\mathcal let \ be a sequence of Borel probability measures on let \ be a sequence of positive real numbers such that and finally let I:\mathcal\to , \infty/math> be a
lower semicontinuous In mathematical analysis, semicontinuity (or semi-continuity) is a property of extended real-valued functions that is weaker than continuity. An extended real-valued function f is upper (respectively, lower) semicontinuous at a point x_0 if, r ...
functional on \mathcal. The sequence \ is said to satisfy a large deviation principle with ''speed'' \ and ''rate'' I if, and only if, for each Borel
measurable set In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as magnitude, mass, and probability of events. These seemingly distinct concepts hav ...
: where \overline and E^\circ denote respectively the closure and interior of


Brief history

The first rigorous results concerning large deviations are due to the Swedish mathematician
Harald Cramér Harald Cramér (; 25 September 1893 – 5 October 1985) was a Swedish mathematician, actuary, and statistician, specializing in mathematical statistics and probabilistic number theory. John Kingman described him as "one of the giants of statis ...
, who applied them to model the insurance business. From the point of view of an insurance company, the earning is at a constant rate per month (the monthly premium) but the claims come randomly. For the company to be successful over a certain period of time (preferably many months), the total earning should exceed the total claim. Thus to estimate the premium you have to ask the following question: "What should we choose as the premium q such that over N months the total claim C = \Sigma X_i should be less than This is clearly the same question asked by the large deviations theory. Cramér gave a solution to this question for i.i.d.
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
s, where the rate function is expressed as a
power series In mathematics, a power series (in one variable) is an infinite series of the form \sum_^\infty a_n \left(x - c\right)^n = a_0 + a_1 (x - c) + a_2 (x - c)^2 + \dots where ''a_n'' represents the coefficient of the ''n''th term and ''c'' is a co ...
. A very incomplete list of mathematicians who have made important advances would include Petrov,Petrov V.V. (1954) Generalization of Cramér's limit theorem. Uspehi Matem. Nauk, v. 9, No 4(62), 195--202.(Russian) Sanov,Sanov I.N. (1957) On the probability of large deviations of random magnitudes. Matem. Sbornik, v. 42 (84), 11--44. S.R.S. Varadhan (who has won the Abel prize for his contribution to the theory), D. Ruelle, O.E. Lanford,
Mark Freidlin Mark Iosifovich Freidlin (, born 1938). See also Russian version, . is a Russian-American probability theorist who works as a Distinguished University Professor of Mathematics at the University of Maryland, College Park. He is one of the namesakes ...
, Alexander D. Wentzell, Amir Dembo, and Ofer Zeitouni.


Applications

Principles of large deviations may be effectively applied to gather information out of a probabilistic model. Thus, theory of large deviations finds its applications in
information theory Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...
and
risk management Risk management is the identification, evaluation, and prioritization of risks, followed by the minimization, monitoring, and control of the impact or probability of those risks occurring. Risks can come from various sources (i.e, Threat (sec ...
. In physics, the best known application of large deviations theory arise in
thermodynamics Thermodynamics is a branch of physics that deals with heat, Work (thermodynamics), work, and temperature, and their relation to energy, entropy, and the physical properties of matter and radiation. The behavior of these quantities is governed b ...
and
statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. Sometimes called statistical physics or statistical thermodynamics, its applicati ...
(in connection with relating
entropy Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...
with rate function).


Large deviations and entropy

The rate function is related to the
entropy Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...
in statistical mechanics. This can be heuristically seen in the following way. In statistical mechanics the entropy of a particular macro-state is related to the number of micro-states which corresponds to this macro-state. In our coin tossing example the mean value M_N could designate a particular macro-state. And the particular sequence of heads and tails which gives rise to a particular value of M_N constitutes a particular micro-state. Loosely speaking a macro-state having a higher number of micro-states giving rise to it, has higher entropy. And a state with higher entropy has a higher chance of being realised in actual experiments. The macro-state with mean value of 1/2 (as many heads as tails) has the highest number of micro-states giving rise to it and it is indeed the state with the highest entropy. And in most practical situations we shall indeed obtain this macro-state for large numbers of trials. The "rate function" on the other hand measures the probability of appearance of a particular macro-state. The smaller the rate function the higher is the chance of a macro-state appearing. In our coin-tossing the value of the "rate function" for mean value equal to 1/2 is zero. In this way one can see the "rate function" as the negative of the "entropy". There is a relation between the "rate function" in large deviations theory and the
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how much a model probability distribution is diff ...
, the connection is established by Sanov's theorem (see Sanov and Novak,Novak S.Y. (2011) Extreme value methods with applications to finance. Chapman & Hall/CRC Press. . ch. 14.5). In a special case, large deviations are closely related to the concept of Gromov–Hausdorff limits.Kotani M., Sunada T. ''Large deviation and the tangent cone at infinity of a crystal lattice'', Math. Z. 254, (2006), 837-870.


See also

* Large deviation principle * Cramér's large deviation theorem * Chernoff's inequality * Sanov's theorem * Contraction principle (large deviations theory), a result on how large deviations principles " push forward" * Freidlin–Wentzell theorem, a large deviations principle for Itō diffusions *
Legendre transformation In mathematics, the Legendre transformation (or Legendre transform), first introduced by Adrien-Marie Legendre in 1787 when studying the minimal surface problem, is an involutive transformation on real-valued functions that are convex on a rea ...
, Ensemble equivalence is based on this transformation. * Laplace principle, a large deviations principle in R''d'' *
Laplace's method In mathematics, Laplace's method, named after Pierre-Simon Laplace, is a technique used to approximate integrals of the form :\int_a^b e^ \, dx, where f is a twice-differentiable function, M is a large number, and the endpoints a and b could b ...
* Schilder's theorem, a large deviations principle for
Brownian motion Brownian motion is the random motion of particles suspended in a medium (a liquid or a gas). The traditional mathematical formulation of Brownian motion is that of the Wiener process, which is often called Brownian motion, even in mathematical ...
* Varadhan's lemma *
Extreme value theory Extreme value theory or extreme value analysis (EVA) is the study of extremes in statistical distributions. It is widely used in many disciplines, such as structural engineering, finance, economics, earth sciences, traffic prediction, and Engin ...
* Large deviations of Gaussian random functions


References


Bibliography


Special invited paper: Large deviations
by S. R. S. Varadhan The Annals of Probability 2008, Vol. 36, No. 2, 397–419
A basic introduction to large deviations: Theory, applications, simulations
Hugo Touchette, arXiv:1106.4146. * Entropy, Large Deviations and Statistical Mechanics by R.S. Ellis, Springer Publication. * Large Deviations for Performance Analysis by Alan Weiss and Adam Shwartz. Chapman and Hall * Large Deviations Techniques and Applications by Amir Dembo and Ofer Zeitouni. Springer * A course on large deviations with an introduction to Gibbs measures by Firas Rassoul-Agha and Timo Seppäläinen. Grad. Stud. Math., 162. American Mathematical Society * Random Perturbations of Dynamical Systems by M.I. Freidlin and A.D. Wentzell. Springer * "Large Deviations for Two Dimensional Navier-Stokes Equation with Multiplicative Noise", S. S. Sritharan and P. Sundar, Stochastic Processes and Their Applications, Vol. 116 (2006) 1636–165

*"Large Deviations for the Stochastic Shell Model of Turbulence", U. Manna, S. S. Sritharan and P. Sundar, NoDEA Nonlinear Differential Equations Appl. 16 (2009), no. 4, 493–52

{{DEFAULTSORT:Large Deviations Theory Large deviations theory, Asymptotic analysis Asymptotic theory (statistics)