In
probability theory
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
, the central limit theorem (CLT) states that, under appropriate conditions, the
distribution of a normalized version of the sample mean converges to a
standard normal distribution. This holds even if the original variables themselves are not
normally distributed
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is
f(x ...
. There are several versions of the CLT, each applying in the context of different conditions.
The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions.
This theorem has seen many changes during the formal development of probability theory. Previous versions of the theorem date back to 1811, but in its modern form it was only precisely stated as late as 1920.
In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the CLT can be stated as: let
denote a
statistical sample
In this statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample (termed sample for short) of individuals from within a statistical population to estimate characteristics of the whole ...
of size
from a population with
expected value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
(average)
and finite positive
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
, and let
denote the sample mean (which is itself a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
). Then the
limit as of the distribution of
is a normal distribution with mean
and variance
.
In other words, suppose that a large sample of
observations is obtained, each observation being randomly produced in a way that does not depend on the values of the other observations, and the average (
arithmetic mean
In mathematics and statistics, the arithmetic mean ( ), arithmetic average, or just the ''mean'' or ''average'' is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results fr ...
) of the observed values is computed. If this procedure is performed many times, resulting in a collection of observed averages, the central limit theorem says that if the sample size is large enough, the
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
of these averages will closely approximate a normal distribution.
The central limit theorem has several variants. In its common form, the random variables must be
independent and identically distributed (i.i.d.). This requirement can be weakened; convergence of the mean to the normal distribution also occurs for non-identical distributions or for non-independent observations if they comply with certain conditions.
The earliest version of this theorem, that the normal distribution may be used as an approximation to the
binomial distribution
In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...
, is the
de Moivre–Laplace theorem.
Independent sequences
Classical CLT
Let
be a sequence of
i.i.d. random variables having a distribution with
expected value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
given by
and finite
variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
given by
Suppose we are interested in the
sample average
By the
law of large numbers
In probability theory, the law of large numbers is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the law o ...
, the sample average
converges almost surely (and therefore also
converges in probability) to the expected value
as
The classical central limit theorem describes the size and the distributional form of the fluctuations around the deterministic number
during this convergence. More precisely, it states that as
gets larger, the distribution of the normalized mean
, i.e. the difference between the sample average
and its limit
scaled by the factor
, approaches the
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
with mean
and variance
For large enough
the distribution of
gets arbitrarily close to the normal distribution with mean
and variance
The usefulness of the theorem is that the distribution of
approaches normality regardless of the shape of the distribution of the individual
Formally, the theorem can be stated as follows:
In the case
convergence in distribution means that the
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
s of
converge pointwise to the cdf of the
distribution: for every real number
where
is the standard normal cdf evaluated at
The convergence is uniform in
in the sense that
where
denotes the least upper bound (or
supremum
In mathematics, the infimum (abbreviated inf; : infima) of a subset S of a partially ordered set P is the greatest element in P that is less than or equal to each element of S, if such an element exists. If the infimum of S exists, it is unique, ...
) of the set.
Lyapunov CLT
In this variant of the central limit theorem the random variables
have to be independent, but not necessarily identically distributed. The theorem also requires that random variables
have
moments of some order and that the rate of growth of these moments is limited by the Lyapunov condition given below.
In practice it is usually easiest to check Lyapunov's condition for
If a sequence of random variables satisfies Lyapunov's condition, then it also satisfies Lindeberg's condition. The converse implication, however, does not hold.
Lindeberg (-Feller) CLT
In the same setting and with the same notation as above, the Lyapunov condition can be replaced with the following weaker one (from
Lindeberg in 1920).
Suppose that for every
,
where
is the
indicator function
In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , then the indicator functio ...
. Then the distribution of the standardized sums
converges towards the standard normal distribution
CLT for the sum of a random number of random variables
Rather than summing an integer number
of random variables and taking
, the sum can be of a random number
of random variables, with conditions on
. For example, the following theorem is Corollary 4 of Robbins (1948). It assumes that
is asymptotically normal (Robbins also developed other conditions that lead to the same result).
Multidimensional CLT
Proofs that use characteristic functions can be extended to cases where each individual
is a
random vector
In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge ...
in with mean vector