probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

, the law of large numbers is a mathematical law that states that the

average In colloquial, ordinary language, an average is a single number or value that best represents a set of data. The type of average taken as most typically representative of a list of numbers is the arithmetic mean the sum of the numbers divided by ...

of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the law of large numbers states that given a sample of independent and identically distributed values, the

sample mean The sample mean (sample average) or empirical mean (empirical average), and the sample covariance or empirical covariance are statistics computed from a sample of data on one or more random variables. The sample mean is the average value (or me ...

converges to the true

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

. The law of large numbers is important because it guarantees stable long-term results for the averages of some

random In common usage, randomness is the apparent or actual lack of definite pattern or predictability in information. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. ...

events. For example, while a

casino A casino is a facility for gambling. Casinos are often built near or combined with hotels, resorts, restaurants, retail shops, cruise ships, and other tourist attractions. Some casinos also host live entertainment, such as stand-up comedy, conce ...

may lose

money Money is any item or verifiable record that is generally accepted as payment for goods and services and repayment of debts, such as taxes, in a particular country or socio-economic context. The primary functions which distinguish money are: m ...

in a single spin of the

roulette Roulette (named after the French language, French word meaning "little wheel") is a casino game which was likely developed from the Italy, Italian game Biribi. In the game, a player may choose to place a bet on a single number, various grouping ...

wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game. Importantly, the law applies (as the name indicates) only when a ''large number'' of observations are considered. There is no principle that a small number of observations will coincide with the expected value or that a streak of one value will immediately be "balanced" by the others (see the

gambler's fallacy The gambler's fallacy, also known as the Monte Carlo fallacy or the fallacy of the maturity of chances, is the belief that, if an event (whose occurrences are Independent and identically distributed random variables, independent and identically dis ...

). The law of large numbers only applies to the ''average'' of the results obtained from repeated trials and claims that this average converges to the expected value; it does not claim that the ''sum'' of ''n'' results gets close to the expected value times ''n'' as ''n'' increases. Throughout its history, many mathematicians have refined this law. Today, the law of large numbers is used in many fields including statistics, probability theory, economics, and insurance.

Examples

For example, a single roll of a six-sided

dice A die (: dice, sometimes also used as ) is a small, throwable object with marked sides that can rest in multiple positions. Dice are used for generating random values, commonly as part of tabletop games, including dice games, board games, ro ...

produces one of the numbers 1, 2, 3, 4, 5, or 6, each with equal

probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...

. Therefore, the

expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...

of the roll is:

\frac = 3.5

According to the law of large numbers, if a large number of six-sided dice are rolled, the average of their values (sometimes called the

) will approach 3.5, with the precision increasing as more dice are rolled. It follows from the law of large numbers that the empirical probability of success in a series of

Bernoulli trial In the theory of probability and statistics, a Bernoulli trial (or binomial trial) is a random experiment with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is ...

s will converge to the theoretical probability. For a Bernoulli random variable, the expected value is the theoretical probability of success, and the average of ''n'' such variables (assuming they are independent and identically distributed (i.i.d.)) is precisely the relative frequency. For example, a

fair coin In probability theory and statistics, a sequence of Independence (probability theory), independent Bernoulli trials with probability 1/2 of success on each trial is metaphorically called a fair coin. One for which the probability is not 1/2 is ca ...

toss is a Bernoulli trial. When a fair coin is flipped once, the theoretical probability that the outcome will be heads is equal to . Therefore, according to the law of large numbers, the proportion of heads in a "large" number of coin flips "should be" roughly . In particular, the proportion of heads after ''n'' flips will

almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (with respect to the probability measure). In other words, the set of outcomes on which the event does not occur ha ...

converge to as ''n'' approaches infinity. Although the proportion of heads (and tails) approaches , almost surely the absolute difference in the number of heads and tails will become large as the number of flips becomes large. That is, the probability that the absolute difference is a small number approaches zero as the number of flips becomes large. Also, almost surely the ratio of the absolute difference to the number of flips will approach zero. Intuitively, the expected difference grows, but at a slower rate than the number of flips. Another good example of the law of large numbers is the

Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be ...

. These methods are a broad class of

computation A computation is any type of arithmetic or non-arithmetic calculation that is well-defined. Common examples of computation are mathematical equation solving and the execution of computer algorithms. Mechanical or electronic devices (or, hist ...

algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...

s that rely on repeated

random sampling In this statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample (termed sample for short) of individuals from within a statistical population to estimate characteristics of the who ...

to obtain numerical results. The larger the number of repetitions, the better the approximation tends to be. The reason that this method is important is mainly that, sometimes, it is difficult or impossible to use other approaches.

Limitation

The average of the results obtained from a large number of trials may fail to converge in some cases. For instance, the average of ''n'' results taken from the

Cauchy distribution The Cauchy distribution, named after Augustin-Louis Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution (after Hendrik Lorentz), Cauchy–Lorentz distribution, Lorentz(ian) ...

or some

Pareto distribution The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, quality control, scientific, geophysical, actuarial scien ...

s (α<1) will not converge as ''n'' becomes larger; the reason is heavy tails. The Cauchy distribution and the Pareto distribution represent two cases: the Cauchy distribution does not have an expectation, whereas the expectation of the Pareto distribution (''α''<1) is infinite. One way to generate the Cauchy-distributed example is where the random numbers equal the

tangent In geometry, the tangent line (or simply tangent) to a plane curve at a given point is, intuitively, the straight line that "just touches" the curve at that point. Leibniz defined it as the line through a pair of infinitely close points o ...

of an angle uniformly distributed between −90° and +90°. The

median The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...

is zero, but the expected value does not exist, and indeed the average of ''n'' such variables have the same distribution as one such variable. It does not converge in probability toward zero (or any other value) as ''n'' goes to infinity. If the trials embed a

selection bias Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population inte ...

, typical in human economic/rational behaviour, the law of large numbers does not help in solving the bias, even if the number of trials is increased the selection bias remains.

History

The Italian mathematician

Gerolamo Cardano Gerolamo Cardano (; also Girolamo or Geronimo; ; ; 24 September 1501– 21 September 1576) was an Italian polymath whose interests and proficiencies ranged through those of mathematician, physician, biologist, physicist, chemist, astrologer, as ...

(1501–1576) stated without proof that the accuracies of empirical statistics tend to improve with the number of trials. This was then formalized as a law of large numbers. A special form of the law of large numbers (for a binary random variable) was first proved by

Jacob Bernoulli Jacob Bernoulli (also known as James in English or Jacques in French; – 16 August 1705) was a Swiss mathematician. He sided with Gottfried Wilhelm Leibniz during the Leibniz–Newton calculus controversy and was an early proponent of Leibniz ...

. It took him over 20 years to develop a sufficiently rigorous mathematical proof which was published in his (''The Art of Conjecturing'') in 1713. He named this his "golden theorem" but it became generally known as "Bernoulli's theorem". This should not be confused with

Bernoulli's principle Bernoulli's principle is a key concept in fluid dynamics that relates pressure, speed and height. For example, for a fluid flowing horizontally Bernoulli's principle states that an increase in the speed occurs simultaneously with a decrease i ...

, named after Jacob Bernoulli's nephew

Daniel Bernoulli Daniel Bernoulli ( ; ; – 27 March 1782) was a Swiss people, Swiss-France, French mathematician and physicist and was one of the many prominent mathematicians in the Bernoulli family from Basel. He is particularly remembered for his applicati ...

. In 1837, S. D. Poisson further described it under the name ("the law of large numbers"). Thereafter, it was known under both names, but the "law of large numbers" is most frequently used. After Bernoulli and Poisson published their efforts, other mathematicians also contributed to refinement of the law, including Chebyshev,

Markov Markov ( Bulgarian, ), Markova, and Markoff are common surnames used in Russia and Bulgaria. Notable people with the name include: Academics * Ivana Markova (1938–2024), Czechoslovak-British emeritus professor of psychology at the University of S ...

, Borel, Cantelli, Kolmogorov and Khinchin. Markov showed that the law can apply to a random variable that does not have a finite variance under some other weaker assumption, and Khinchin showed in 1929 that if the series consists of independent identically distributed random variables, it suffices that the

exists for the weak law of large numbers to be true. These further studies have given rise to two prominent forms of the law of large numbers. One is called the "weak" law and the other the "strong" law, in reference to two different modes of

convergence Convergence may refer to: Arts and media Literature *''Convergence'' (book series), edited by Ruth Nanda Anshen *Convergence (comics), "Convergence" (comics), two separate story lines published by DC Comics: **A four-part crossover storyline that ...

of the cumulative sample means to the expected value; in particular, as explained below, the strong form implies the weak.

Forms

There are two different versions of the law of large numbers that are described below. They are called the'' strong law of large numbers'' and the ''weak law of large numbers''. Stated for the case where ''X''₁, ''X''₂, ... is an infinite sequence of independent and identically distributed (i.i.d.) Lebesgue integrable random variables with expected value E(''X''₁) = E(''X''₂) = ... = ''μ'', both versions of the law state that the sample average

\overline_n=\frac1n(X_1+\cdots+X_n)

converges to the expected value: (Lebesgue integrability of ''X_j'' means that the expected value E(''X_j'') exists according to Lebesgue integration and is finite. It does ''not'' mean that the associated probability measure is

absolutely continuous In calculus and real analysis, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity. The notion of absolute continuity allows one to obtain generalizations of the relationship betwe ...

with respect to

Lebesgue measure In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of higher dimensional Euclidean '-spaces. For lower dimensions or , it c ...

.) Introductory probability texts often additionally assume identical finite

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

\operatorname (X_i) = \sigma^2

(for all

i

) and no correlation between random variables. In that case, the variance of the average of ''n'' random variables is

\operatorname(\overline_n) = \operatorname(\tfrac1n(X_1+\cdots+X_n)) = \frac \operatorname(X_1+\cdots+X_n) = \frac = \frac.

which can be used to shorten and simplify the proofs. This assumption of finite

is ''not necessary''. Large or infinite variance will make the convergence slower, but the law of large numbers holds anyway. Mutual independence of the random variables can be replaced by pairwise independence or exchangeability in both versions of the law. The difference between the strong and the weak version is concerned with the mode of convergence being asserted. For interpretation of these modes, see

Convergence of random variables In probability theory, there exist several different notions of convergence of sequences of random variables, including ''convergence in probability'', ''convergence in distribution'', and ''almost sure convergence''. The different notions of conve ...

Weak law

The weak law of large numbers (also called Khinchin's law) states that given a collection of independent and identically distributed (iid) samples from a random variable with finite mean, the sample mean converges in probability to the expected value That is, for any positive number ''ε'',

\lim_\Pr\!\left(\,, \overline_n-\mu,  < \varepsilon\,\right) = 1.

Interpreting this result, the weak law states that for any nonzero margin specified (''ε''), no matter how small, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value; that is, within the margin. As mentioned earlier, the weak law applies in the case of i.i.d. random variables, but it also applies in some other cases. For example, the variance may be different for each random variable in the series, keeping the expected value constant. If the variances are bounded, then the law applies, as shown by Chebyshev as early as 1867. (If the expected values change during the series, then we can simply apply the law to the average deviation from the respective expected values. The law then states that this converges in probability to zero.) In fact, Chebyshev's proof works so long as the variance of the average of the first ''n'' values goes to zero as ''n'' goes to infinity. As an example, assume that each random variable in the series follows a

Gaussian distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...

(normal distribution) with mean zero, but with variance equal to

2n/\log(n+1)

, which is not bounded. At each stage, the average will be normally distributed (as the average of a set of normally distributed variables). The variance of the sum is equal to the sum of the variances, which is

asymptotic In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates Limit of a function#Limits at infinity, tends to infinity. In pro ...

n^2 / \log n

. The variance of the average is therefore asymptotic to

1 / \log n

and goes to zero. There are also examples of the weak law applying even though the expected value does not exist.

Strong law

The strong law of large numbers (also called Kolmogorov's law) states that the sample average converges almost surely to the expected value That is,

\Pr\!\left( \lim_\overline_n = \mu \right) = 1.

What this means is that, as the number of trials ''n'' goes to infinity, the probability that the average of the observations converges to the expected value, is equal to one. The modern proof of the strong law is more complex than that of the weak law, and relies on passing to an appropriate sub-sequence. The strong law of large numbers can itself be seen as a special case of the pointwise ergodic theorem. This view justifies the intuitive interpretation of the expected value (for Lebesgue integration only) of a random variable when sampled repeatedly as the "long-term average". Law 3 is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). However the weak law is known to hold in certain conditions where the strong law does not hold and then the convergence is only weak (in probability). See Differences between the weak law and the strong law. The strong law applies to independent identically distributed random variables having an expected value (like the weak law). This was proved by Kolmogorov in 1930. It can also apply in other cases. Kolmogorov also showed, in 1933, that if the variables are independent and identically distributed, then for the average to converge almost surely on ''something'' (this can be considered another statement of the strong law), it is necessary that they have an expected value (and then of course the average will converge almost surely on that). If the summands are independent but not identically distributed, then provided that each ''X''_''k'' has a finite second moment and

< \infty.

This statement is known as ''Kolmogorov's strong law'', see e.g. .

Differences between the weak law and the strong law

The ''weak law'' states that for a specified large ''n'', the average

\overline_n

is likely to be near ''μ''. Thus, it leaves open the possibility that

, \overline_n -\mu,  > \varepsilon

happens an infinite number of times, although at infrequent intervals. (Not necessarily

, \overline_n -\mu,  \neq 0

for all ''n''). The ''strong law'' shows that this

will not occur. It does not imply that with probability 1, we have that for any the inequality

, \overline_n -\mu,  < \varepsilon

holds for all large enough ''n'', since the convergence is not necessarily uniform on the set where it holds. The strong law does not hold in the following cases, but the weak law does.

Uniform laws of large numbers

There are extensions of the law of large numbers to collections of estimators, where the convergence is uniform over the collection; thus the name ''uniform law of large numbers''. Suppose ''f''(''x'',''θ'') is some function defined for ''θ'' ∈ Θ, and continuous in ''θ''. Then for any fixed ''θ'', the sequence will be a sequence of independent and identically distributed random variables, such that the sample mean of this sequence converges in probability to E 'f''(''X'',''θ'') This is the ''pointwise'' (in ''θ'') convergence. A particular example of a uniform law of large numbers states the conditions under which the convergence happens ''uniformly'' in ''θ''. If # ''Θ'' is compact, # ''f''(''x'',''θ'') is continuous at each ''θ'' ∈ Θ for

almost all In mathematics, the term "almost all" means "all but a negligible quantity". More precisely, if X is a set (mathematics), set, "almost all elements of X" means "all elements of X but those in a negligible set, negligible subset of X". The meaning o ...

''x''s, and measurable function of ''x'' at each ''θ''. # there exists a dominating function ''d''(''x'') such that E 'd''(''X'')< ∞, and

\left\,  f(x,\theta) \right\,  \leq d(x) \quad\text\ \theta\in\Theta.

Then E 'f''(''X'',''θ'')is continuous in ''θ'', and

\right\, \overset \ 0.

This result is useful to derive consistency of a large class of estimators (see Extremum estimator).

Borel's law of large numbers

Borel's law of large numbers, named after

Émile Borel Félix Édouard Justin Émile Borel (; 7 January 1871 – 3 February 1956) was a French people, French mathematician and politician. As a mathematician, he was known for his founding work in the areas of measure theory and probability. Biograp ...

, states that if an experiment is repeated a large number of times, independently under identical conditions, then the proportion of times that any specified event is expected to occur approximately equals the probability of the event's occurrence on any particular trial; the larger the number of repetitions, the better the approximation tends to be. More precisely, if ''E'' denotes the event in question, ''p'' its probability of occurrence, and ''N_n''(''E'') the number of times ''E'' occurs in the first ''n'' trials, then with probability one,

\frac\to p\textn\to\infty.

This theorem makes rigorous the intuitive notion of probability as the expected long-run relative frequency of an event's occurrence. It is a special case of any of several more general laws of large numbers in probability theory. Chebyshev's inequality. Let ''X'' be a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...

with finite

''μ'' and finite non-zero

''σ''². Then for any

real number In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...

\Pr(, X-\mu, \geq k\sigma) \leq \frac.

Proof of the weak law

Given ''X''₁, ''X''₂, ... an infinite sequence of i.i.d. random variables with finite expected value

E(X_1)=E(X_2)=\cdots=\mu<\infty

, we are interested in the convergence of the sample average

\overline_n=\tfrac1n(X_1+\cdots+X_n).

The weak law of large numbers states:

Proof using Chebyshev's inequality assuming finite variance

This proof uses the assumption of finite

\operatorname (X_i)=\sigma^2

(for all

i

). The independence of the random variables implies no correlation between them, and we have that

\operatorname(\overline_n) = \operatorname(\tfrac1n(X_1+\cdots+X_n)) = \frac \operatorname(X_1+\cdots+X_n) = \frac = \frac.

The common mean μ of the sequence is the mean of the sample average:

E(\overline_n) = \mu.

Using Chebyshev's inequality on

\overline_n

results in

\operatorname( \left,  \overline_n-\mu \ \geq \varepsilon) \leq \frac.

This may be used to obtain the following:

\operatorname( \left,  \overline_n-\mu \ < \varepsilon) = 1 - \operatorname( \left,  \overline_n-\mu \ \geq \varepsilon) \geq 1 - \frac.

As ''n'' approaches infinity, the expression approaches 1. And by definition of convergence in probability, we have obtained

Proof using convergence of characteristic functions

Taylor's theorem In calculus, Taylor's theorem gives an approximation of a k-times differentiable function around a given point by a polynomial of degree k, called the k-th-order Taylor polynomial. For a smooth function, the Taylor polynomial is the truncation a ...

for complex functions, the

characteristic function In mathematics, the term "characteristic function" can refer to any of several distinct concepts: * The indicator function of a subset, that is the function \mathbf_A\colon X \to \, which for a given subset ''A'' of ''X'', has value 1 at points ...

of any random variable, ''X'', with finite mean μ, can be written as

\varphi_X(t) = 1 + it\mu + o(t), \quad t \rightarrow 0.

All ''X''₁, ''X''₂, ... have the same characteristic function, so we will simply denote this ''φ''_''X''. Among the basic properties of characteristic functions there are

\varphi_(t)= \varphi_X(\tfrac t n) \quad \text \quad
 \varphi_(t) = \varphi_X(t) \varphi_Y(t) \quad

if ''X'' and ''Y'' are independent. These rules can be used to calculate the characteristic function of

\overline_n

in terms of ''φ''_''X'':

n \, \rightarrow \, e^, \quad \text \quad n \to \infty.

The limit ''e''^''itμ'' is the characteristic function of the constant random variable μ, and hence by the Lévy continuity theorem,

\overline_n

converges in distribution to μ:

\overline_n \, \overset \, \mu \qquad\text\qquad n \to \infty.

μ is a constant, which implies that convergence in distribution to μ and convergence in probability to μ are equivalent (see

.) Therefore, This shows that the sample mean converges in probability to the derivative of the characteristic function at the origin, as long as the latter exists.

Proof of the strong law

We give a relatively simple proof of the strong law under the assumptions that the

X_i

are iid,

=: \mu < \infty

\operatorname (X_i)=\sigma^2 < \infty

, and

=: \tau < \infty

. Let us first note that without loss of generality we can assume that

\mu = 0

by centering. In this case, the strong law says that

\Pr\!\left( \lim_\overline_n = 0 \right) = 1,

\Pr\left(\omega: \lim_\fracn = 0 \right) = 1.

It is equivalent to show that

\Pr\left(\omega: \lim_\fracn \neq 0 \right) = 0,

Note that

\lim_\fracn \neq 0 \iff \exists\epsilon>0, \left, \fracn\ \ge \epsilon\ \mbox,

and thus to prove the strong law we need to show that for every

\epsilon > 0

, we have

\Pr\left(\omega: , S_n(\omega),  \ge n\epsilon \mbox \right) = 0.

Define the events

A_n = \

, and if we can show that

\sum_^\infty \Pr(A_n) <\infty,

then the Borel-Cantelli Lemma implies the result. So let us estimate

\Pr(A_n)

. We compute

_n^4 = \left left(\sum_^n X_i\right)^4\right = \left sum_ X_iX_jX_kX_l\right

We first claim that every term of the form

X_i^3X_j, X_i^2X_jX_k, X_iX_jX_kX_l

where all subscripts are distinct, must have zero expectation. This is because

_i^3X_j =_i^3 X_j]

by independence, and the last term is zero—and similarly for the other terms. Therefore the only terms in the sum with nonzero expectation are

_i^4 /math> and_i^2X_j^2 /math>.  Since the X_i are identically distributed, all of these are the same, and moreover_i^2X_j^2 (_i^2^2 .  

There are n terms of the form_i^4 /math> and 3 n (n-1) terms of the form (_i^2^2, and so_n^4 = n \tau + 3n(n-1)\sigma^4. Note that the right-hand side is a quadratic polynomial in n, and as such there exists a C>0 such that_n^4 \le Cn^2 for n sufficiently large.  By Markov, \Pr(, S_n,  \ge n \epsilon) \le \frac1_n^4 \le \frac, for n sufficiently large, and therefore this series is summable.  Since this holds for any \epsilon > 0, we have established the strong law of large numbers. The proof can be strengthened immensely by dropping all finiteness assumptions on the second and fourth moments. It can also be extended for example to discuss partial sums of distributions without any finite moments. Such proofs use more intricate arguments to prove the same Borel-Cantelli predicate, a strategy attributed to Kolmogorov to conceptually bring the limit inside the probability parentheses. For a proof without the added assumption of a finite fourth moment, see Section 22 of

Consequences

The law of large numbers provides an expectation of an unknown distribution from a realization of the sequence, but also any feature of the

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

. By applying Borel's law of large numbers, one could easily obtain the probability mass function. For each event in the objective probability mass function, one could approximate the probability of the event's occurrence with the proportion of times that any specified event occurs. The larger the number of repetitions, the better the approximation. As for the continuous case:

C=(a-h,a+h]

, for small positive h. Thus, for large n:

\frac\thickapprox
p = P(X\in C) = \int_^ f(x) \, dx 
\thickapprox
2hf(a)

With this method, one can cover the whole x-axis with a grid (with grid size 2h) and obtain a bar graph which is called a

histogram A histogram is a visual representation of the frequency distribution, distribution of quantitative data. To construct a histogram, the first step is to Data binning, "bin" (or "bucket") the range of values— divide the entire range of values in ...

Applications

One application of the law of large numbers is an important method of approximation known as the

, which uses a random sampling of numbers to approximate numerical results. The algorithm to compute an integral of f(x) on an interval ,bis as follows: # Simulate uniform random variables X₁, X₂, ..., X_n which can be done using a software, and use a random number table that gives U₁, U₂, ..., U_n independent and identically distributed (i.i.d.) random variables on ,1 Then let X_i = a+(b - a)U_i for i= 1, 2, ..., n. Then X₁, X₂, ..., X_n are independent and identically distributed uniform random variables on , b # Evaluate f(X₁), f(X₂), ..., f(X_n) # Take the average of f(X₁), f(X₂), ..., f(X_n) by computing

(b-a)\tfrac

and then by the strong law of large numbers, this converges to

(b-a)E(f(X_1))

(b-a)\int_^ f(x)\tfrac

\int_^ f(x)

We can find the integral of

f(x) = cos^2(x)\sqrt

on 1,2 Using traditional methods to compute this integral is very difficult, so the Monte Carlo method can be used here. Using the above algorithm, we get

\int_^ f(x)

= 0.905 when n=25 and

\int_^ f(x)

= 1.028 when n=250 We observe that as n increases, the numerical value also increases. When we get the actual results for the integral we get

\int_^ f(x)

= 1.000194 When the LLN was used, the approximation of the integral was closer to its true value, and thus more accurate. Another example is the integration of f(x) =

\frac

on ,1 Using the Monte Carlo method and the LLN, we can see that as the number of samples increases, the numerical value gets closer to 0.4180233.

Notes

References

* * * * * * * *

External links

* * *
Animations for the Law of Large Numbers
by Yihui Xie using the R packag
animation

Apple CEO Tim Cook said something that would make statisticians cringe
"We don't believe in such laws as laws of large numbers. This is sort of, uh, old dogma, I think, that was cooked up by somebody . said Tim Cook and while: "However, the law of large numbers has nothing to do with large companies, large revenues, or large growth rates. The law of large numbers is a fundamental concept in probability theory and statistics, tying together theoretical probabilities that we can calculate to the actual outcomes of experiments that we empirically perform.'' explained

Business Insider ''Business Insider'' (stylized in all caps: BUSINESS INSIDER; known from 2021 to 2023 as INSIDER) is a New York City–based multinational financial and business news website founded in 2007. Since 2015, a majority stake in ''Business Inside ...

'' {{Authority control Theorems in probability theory Mathematical proofs Asymptotic theory (statistics) Theorems in statistics Large numbers