probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, the

central limit theorem In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the Probability distribution, distribution of a normalized version of the sample mean converges to a Normal distribution#Standard normal distributi ...

states that, under certain circumstances, the

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

of the scaled mean of a random sample converges to a

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

as the sample size increases to infinity. Under stronger assumptions, the Berry–Esseen theorem, or Berry–Esseen inequality, gives a more quantitative result, because it also specifies the rate at which this convergence takes place by giving a bound on the maximal error of

approximation An approximation is anything that is intentionally similar but not exactly equal to something else. Etymology and usage The word ''approximation'' is derived from Latin ''approximatus'', from ''proximus'' meaning ''very near'' and the prefix ...

between the normal distribution and the true distribution of the scaled sample mean. The approximation is measured by the Kolmogorov–Smirnov distance. In the case of independent samples, the convergence rate is , where is the sample size, and the constant is estimated in terms of the third absolute normalized moment. It is also possible to give non-uniform bounds which become more strict for more extreme events.

Statement of the theorem

Statements of the theorem vary, as it was independently discovered by two

mathematician A mathematician is someone who uses an extensive knowledge of mathematics in their work, typically to solve mathematical problems. Mathematicians are concerned with numbers, data, quantity, mathematical structure, structure, space, Mathematica ...

s, Andrew C. Berry (in 1941) and Carl-Gustav Esseen (1942), who then, along with other authors, refined it repeatedly over subsequent decades.

Identically distributed summands

One version, sacrificing generality somewhat for the sake of clarity, is the following: :There exists a positive constant ''C'' such that if ''X''₁, ''X''₂, ..., are i.i.d. random variables with E(''X''₁) = 0, E(''X''₁²) = ''σ''² > 0, and E(, ''X''₁, ³) = ''ρ'' < ∞,Since the random variables are identically distributed, ''X''₂, ''X''₃, ... all have the same moments as ''X''₁. and if we define ::

Y_n =

:the

sample mean The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or me ...

, with ''F''_''n'' the

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ever ...

of ::

,

:and Φ the cumulative distribution function of the standard normal distribution, then for all ''x'' and ''n'', ::

\left, F_n(x) - \Phi(x)\ \le .\ \ \ \ (1)

That is: given a sequence of

independent and identically distributed random variables Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artis ...

, each having

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

zero and positive

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

, if additionally the third absolute moment is finite, then the

s of the

standardized Standardization (American English) or standardisation (British English) is the process of implementing and developing technical standards based on the consensus of different parties that include firms, users, interest groups, standards organiza ...

sample mean and the standard normal distribution differ (vertically, on a graph) by no more than the specified amount. Note that the approximation error for all ''n'' (and hence the limiting rate of convergence for indefinite ''n'' sufficiently large) is bounded by the order of ''n''^−1/2. Calculated upper bounds on the constant ''C'' have decreased markedly over the years, from the original value of 7.59 by Esseen in 1942. The estimate ''C'' < 0.4748 follows from the inequality :

\sup_\left, F_n(x) - \Phi(x)\ \le ,

since ''σ''³ ≤ ''ρ'' and 0.33554 · 1.415 < 0.4748. However, if ''ρ'' ≥ 1.286''σ''³, then the estimate :

\sup_\left, F_n(x) - \Phi(x)\ \le ,

is even tighter. proved that the constant also satisfies the lower bound :

C\geq\frac \approx 0.40973 \approx \frac + 0.01079 .

Non-identically distributed summands

:Let ''X''₁, ''X''₂, ..., be independent random variables with E(''X''_''i'') = 0, E(''X''_''i''²) = ''σ''_''i''² > 0, and E(, ''X''_''i'', ³) = ''ρ''_''i'' < ∞. Also, let ::

S_n =

:be the normalized ''n''-th partial sum. Denote ''F''_''n'' the cdf of ''S''_''n'', and Φ the cdf of the standard normal distribution. For the sake of convenience denote ::

\vec=(\sigma_1,\ldots,\sigma_n),\ \vec=(\rho_1,\ldots,\rho_n).

:In 1941, Andrew C. Berry proved that for all ''n'' there exists an absolute constant ''C''₁ such that ::

\sup_\left, F_n(x) - \Phi(x)\ \le C_1\cdot\psi_1,\ \ \ \ (2)

:where ::

\psi_1=\psi_1\big(\vec,\vec\big)=\Big(\Big)^\cdot\max_\frac.

:Independently, in 1942, Carl-Gustav Esseen proved that for all ''n'' there exists an absolute constant ''C''₀ such that ::

\sup_\left, F_n(x) - \Phi(x)\ \le C_0\cdot\psi_0, \ \ \ \ (3)

:where ::

\psi_0=\psi_0\big(\vec,\vec\big)=\Big(\Big)^\cdot\sum\limits_^n\rho_i.

It is easy to make sure that ψ₀≤ψ₁. Due to this circumstance inequality (3) is conventionally called the Berry–Esseen inequality, and the quantity ψ₀ is called the Lyapunov fraction of the third order. Moreover, in the case where the summands ''X''₁, ..., ''X''_''n'' have identical distributions ::

\psi_0=\psi_1=\frac,

and thus the bounds stated by inequalities (1), (2) and (3) coincide apart from the constant. Regarding ''C''₀, obviously, the lower bound established by remains valid: :

C_0\geq\frac = 0.4097\ldots.

The lower bound is exactly reached only for certain Bernoulli distributions (see for their explicit expressions). The upper bounds for ''C''₀ were subsequently lowered from Esseen's original estimate 7.59 to 0.5600.

Sum of a random number of random variables

Berry–Esseen theorems exist for the sum of a random number of random variables. The following is Theorem 1 from Korolev (1989), substituting in the constants from Remark 3. It is only a portion of the results that they established: :Let

\

be independent, identically distributed random variables with

E(X_i) = \mu

\operatorname(X_i) = \sigma^2

E, X_i - \mu, ^3 = \kappa^3

. Let

N

be a non-negative integer-valued random variable, independent from

\

. Let

S_N = X_1 + \cdots + X_N

, and define ::

\Delta = \sup_ \left, 
    P\left(
      \frac
      \leq
      z
    \right)
    -
    \Phi(z)
  \

:Then ::

\Delta \leq
  3.8696\frac +
  1.0395\frac +
  0.2420\frac

Multidimensional version

As with the multidimensional central limit theorem, there is a multidimensional version of the Berry–Esseen theorem. :Let

X_1,\dots,X_n

be independent

\mathbb R^d

-valued random vectors each having mean zero. Write

S_n = \sum_^n X_i

and assume

\Sigma_n = \operatorname_n /math> is invertible. Let Z_n\sim\operatorname(0,) be a d -dimensional Gaussian with the same mean and covariance matrix as S_n . Then for all convex sets U\subseteq\mathbb R^d,
:: \big, \Pr_n\in U \Pr_n\in U,\big,  \le C d^ \gamma_n,
:where C is a universal constant and \gamma_n=\sum_^n \operatorname\big \Sigma_n^X_i\, _2^3\big /math> (the third power of the L

² norm). The dependency on

d^

is conjectured to be optimal, but might not be.

Non-uniform bounds

The bounds given above consider the maximal difference between the cdf's. They are 'uniform' in that they do not depend on

x

and quantify the uniform convergence

F_n \to \Phi

. However, because

F_n(x) - \Phi(x)

goes to zero for large

x

by general properties of cdf's, these uniform bounds will be overestimating the difference for such arguments. This is despite the uniform bounds being sharp in general. It is therefore desirable to obtain upper bounds which depend on

x

and in this way become smaller for large

x

. One such result going back to that was since improved multiple times is the following. :As above, let ''X''₁, ''X''₂, ..., be independent random variables with E(''X''_''i'') = 0, E(''X''_''i''²) = ''σ''_''i''² > 0, and E(, ''X''_''i'', ³) = ''ρ''_''i'' < ∞. Also, let

\sigma^2 = \sum_^ \sigma_i^2

and ::

S_n =

:be the normalized ''n''-th partial sum. Denote ''F''_''n'' the cdf of ''S''_''n'', and Φ the cdf of the standard normal distribution. Then ::

, F_n(x) - \Phi(x),  \leq \frac \cdot \sum_^n \rho_i

, :where

C_3

is a universal constant. The constant

C_3

may be taken as 114.667. Moreover, if the

X_i

are identically distributed, it can be taken as

C + 8(1+\mathrm)

, where

C

is the constant from the first theorem above, and hence 30.2211 works.

Notes

References

Bibliography

* * Durrett, Richard (1991). ''Probability: Theory and Examples''. Pacific Grove, CA: Wadsworth & Brooks/Cole. . * * * * Feller, William (1972). ''An Introduction to Probability Theory and Its Applications, Volume II'' (2nd ed.). New York: John Wiley & Sons. . * * * Manoukian, Edward B. (1986). ''Modern Concepts and Theorems of Mathematical Statistics''. New York: Springer-Verlag. . * Serfling, Robert J. (1980). ''Approximation Theorems of Mathematical Statistics''. New York: John Wiley & Sons. . * * * * * * * * *

External links

* Gut, Allan & Holst Lars
Carl-Gustav Esseen
retrieved Mar. 15, 2004. * {{DEFAULTSORT:Berry-Esseen theorem Probabilistic inequalities Theorems in statistics Central limit theorem