In statistics, the method of moments is a method of

estimation Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...

of population

parameters A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

. The same principle is used to derive higher moments like skewness and kurtosis. It starts by expressing the population moments (i.e., the expected values of powers of the random variable under consideration) as functions of the parameters of interest. Those expressions are then set equal to the sample moments. The number of such equations is the same as the number of parameters to be estimated. Those equations are then solved for the parameters of interest. The solutions are estimates of those parameters. The method of moments was introduced by

Pafnuty Chebyshev Pafnuty Lvovich Chebyshev ( rus, Пафну́тий Льво́вич Чебышёв, p=pɐfˈnutʲɪj ˈlʲvovʲɪtɕ tɕɪbɨˈʂof) ( – ) was a Russian mathematician and considered to be the founding father of Russian mathematics. Chebyshe ...

in 1887 in the proof of the central limit theorem. The idea of matching empirical moments of a distribution to the population moments dates back at least to

Pearson Pearson may refer to: Organizations Education *Lester B. Pearson College, Victoria, British Columbia, Canada *Pearson College (UK), London, owned by Pearson PLC *Lester B. Pearson High School (disambiguation) Companies *Pearson PLC, a UK-based int ...

Method

Suppose that the problem is to estimate

k

unknown parameters

\theta_, \theta_2, \dots, \theta_k

characterizing the distribution

f_W(w; \theta)

of the random variable

W

. Suppose the first

k

moments of the true distribution (the "population moments") can be expressed as functions of the

\theta

s: :

g_k(\theta_1, \theta_2, \ldots, \theta_k). \end

Suppose a sample of size

n

is drawn, resulting in the values

w_1, \dots, w_n

. For

j=1,\dots,k

, let :

\widehat\mu_j = \frac \sum_^n w_i^j

be the ''j''-th sample moment, an estimate of

\mu_j

. The method of moments estimator for

\theta_1, \theta_2, \ldots, \theta_k

denoted by

\widehat\theta_1, \widehat\theta_2, \dots, \widehat\theta_k

is defined as the solution (if there is one) to the equations: :

\widehat \mu_2 & = g_2(\widehat\theta_1, \widehat\theta_2, \ldots, \widehat\theta_k), \\ & \,\,\, \vdots \\ \widehat \mu_k & = g_k(\widehat\theta_1, \widehat\theta_2, \ldots, \widehat\theta_k). \end

Advantages and disadvantages

The method of moments is fairly simple and yields

consistent estimator In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter ''θ''0—having the property that as the number of data points used increases indefinitely, the result ...

s (under very weak assumptions), though these estimators are often biased. It is an alternative to the method of maximum likelihood. However, in some cases the likelihood equations may be intractable without computers, whereas the method-of-moments estimators can be computed much more quickly and easily. Due to easy computability, method-of-moments estimates may be used as the first approximation to the solutions of the likelihood equations, and successive improved approximations may then be found by the Newton–Raphson method. In this way the method of moments can assist in finding maximum likelihood estimates. In some cases, infrequent with large samples but not so infrequent with small samples, the estimates given by the method of moments are outside of the parameter space (as shown in the example below); it does not make sense to rely on them then. That problem never arises in the method of

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stat ...

. Also, estimates by the method of moments are not necessarily sufficient statistics, i.e., they sometimes fail to take into account all relevant information in the sample. When estimating other structural parameters (e.g., parameters of a

utility function As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosoph ...

, instead of parameters of a known probability distribution), appropriate probability distributions may not be known, and moment-based estimates may be preferred to maximum likelihood estimation.

Examples

An example application of the method of moments is to estimate polynomial probability density distributions. In this case, an approximate polynomial of order

N

is defined on an interval

,b /math>. The  method of moments then yields a system of equations, whose solution involves the inversion of a

Hankel matrix In linear algebra, a Hankel matrix (or catalecticant matrix), named after Hermann Hankel, is a square matrix in which each ascending skew-diagonal from left to right is constant, e.g.: \qquad\begin a & b & c & d & e \\ b & c & d & e & f \\ c & d & ...

.J. Munkhammar, L. Mattsson, J. Rydén (2017) "Polynomial probability distribution estimation using the method of moments". PLoS ONE 12(4): e0174573. https://doi.org/10.1371/journal.pone.0174573

Proving the central limit theorem

Let

X_1, X_2, \cdots

be independent random variables with mean 0 and variance 1, then let

S_n := \frac\sum_^n X_i

. We can compute the moments of

S_n

= 0, \cdots

Explicit expansion shows that

= \frac = \frac (2k-1)!!

where the numerator is the number of ways to select

k

distinct pairs of balls by picking one each from

2k

buckets, each containing balls numbered from

1

n

. At the

n \to \infty

limit, all moments converge to that of a standard normal distribution. More analysis then show that this convergence in moments imply a convergence in distribution. Essentially this argument was published by Chebyshev in 1887.

Uniform distribution

Consider the uniform distribution on the interval

,b /math>, U(a,b) . If W\sim U(a,b) then we have

: \mu_1 = \operatorname E \frac(a+b) : \mu_2 = \operatorname E^2 \frac(a^2+ab+b^2) Solving these equations gives

: \widehat = \mu_1 - \sqrt : \widehat = \mu_1 + \sqrt Given a set of samples \ we can use the sample moments \widehat_1 and \widehat_2 in these formulae in order to estimate a and b .

Note, however, that this method can produce inconsistent results in some cases. For example, the set of samples \ results in the estimate \widehat=\frac-\frac, \widehat=\frac+\frac even though \widehat<1 and so it is impossible for the set \ to have been drawn from U(\widehat,\widehat) in this case.

References