econometrics Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics", '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...

and

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, the generalized method of moments (GMM) is a generic method for estimating parameters in

statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...

s. Usually it is applied in the context of

semiparametric model In statistics, a semiparametric model is a statistical model that has parametric and nonparametric components. A statistical model is a parameterized family of distributions: \ indexed by a parameter \theta. * A parametric model is a model i ...

s, where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore

maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...

is not applicable. The method requires that a certain number of ''moment conditions'' be specified for the model. These moment conditions are functions of the model parameters and the data, such that their expectation is zero at the parameters' true values. The GMM method then minimizes a certain

norm Norm, the Norm or NORM may refer to: In academic disciplines * Normativity, phenomenon of designating things as good or bad * Norm (geology), an estimate of the idealised mineral content of a rock * Norm (philosophy), a standard in normative e ...

of the sample averages of the moment conditions, and can therefore be thought of as a

special case In logic, especially as applied in mathematics, concept is a special case or specialization of concept precisely if every instance of is also an instance of but not vice versa, or equivalently, if is a generalization of .Brown, James Robert.� ...

minimum-distance estimation Minimum-distance estimation (MDE) is a conceptual method for fitting a statistical model to data, usually the Empirical distribution function, empirical distribution. Often-used estimators such as ordinary least squares can be thought of as special ...

. The GMM

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on Sample (statistics), observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguish ...

s are known to be

consistent In deductive logic, a consistent theory is one that does not lead to a logical contradiction. A theory T is consistent if there is no formula \varphi such that both \varphi and its negation \lnot\varphi are elements of the set of consequences ...

, asymptotically normal, and most efficient in the class of all estimators that do not use any extra information aside from that contained in the moment conditions. GMM were advocated by

Lars Peter Hansen Lars Peter Hansen (born 26 October 1952 in Urbana, Illinois) is an Americans, American economy, economist. He is the David Rockefeller Distinguished Service Professor in Economics, Statistics, and the Booth School of Business, at the Universi ...

in 1982 as a generalization of the method of moments, introduced by

Karl Pearson Karl Pearson (; born Carl Pearson; 27 March 1857 – 27 April 1936) was an English biostatistician and mathematician. He has been credited with establishing the discipline of mathematical statistics. He founded the world's first university ...

in 1894. However, these estimators are mathematically equivalent to those based on "orthogonality conditions" (Sargan, 1958, 1959) or "unbiased estimating equations" (Huber, 1967; Wang et al., 1997).

Description

Suppose the available data consists of ''T'' observations , where each observation ''Y_t'' is an ''n''-dimensional

multivariate random variable In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge ...

. We assume that the data come from a certain

, defined up to an unknown

parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

. The goal of the estimation problem is to find the “true” value of this parameter, ''θ''₀, or at least a reasonably close estimate. A general assumption of GMM is that the data ''Y_t'' be generated by a weakly stationary

ergodic In mathematics, ergodicity expresses the idea that a point of a moving system, either a dynamical system or a stochastic process, will eventually visit all parts of the space that the system moves in, in a uniform and random sense. This implies th ...

stochastic process In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Sto ...

. (The case of

independent and identically distributed Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

(iid) variables ''Y_t'' is a special case of this condition.) In order to apply GMM, we need to have "moment conditions", that is, we need to know a

vector-valued function A vector-valued function, also referred to as a vector function, is a mathematical function of one or more variables whose range is a set of multidimensional vectors or infinite-dimensional vectors. The input of a vector-valued function could ...

''g''(''Y'',''θ'') such that :

m(\theta_0) \equiv \operatorname,g(Y_t,\theta_0)\, 0,

where E denotes expectation, and ''Y_t'' is a generic observation. Moreover, the function ''m''(''θ'') must differ from zero for , otherwise the parameter ''θ'' will not be point- identified. The basic idea behind GMM is to replace the theoretical expected value E ��with its empirical analog—sample average: :

\hat(\theta) \equiv \frac\sum_^T g(Y_t,\theta)

and then to minimize the norm of this expression with respect to ''θ''. The minimizing value of ''θ'' is our estimate for ''θ''₀. By the

law of large numbers In probability theory, the law of large numbers is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the law o ...

,=\,m(\theta)

for large values of ''T'', and thus we expect that

\scriptstyle\hat(\theta_0)\;\approx\;m(\theta_0)\;=\;0

. The generalized method of moments looks for a number

\scriptstyle\hat\theta

which would make

\scriptstyle\hat(\;\!\hat\theta\;\!)

as close to zero as possible. Mathematically, this is equivalent to minimizing a certain norm of

\scriptstyle\hat(\theta)

(norm of ''m'', denoted as , , ''m'', , , measures the distance between ''m'' and zero). The properties of the resulting estimator will depend on the particular choice of the norm function, and therefore the theory of GMM considers an entire family of norms, defined as :

\,  \hat(\theta) \, ^2_ = \hat(\theta)^\,W\hat(\theta),

where ''W'' is a

positive-definite In mathematics, positive definiteness is a property of any object to which a bilinear form or a sesquilinear form may be naturally associated, which is positive-definite. See, in particular: * Positive-definite bilinear form * Positive-definite ...

weighting matrix, and

m^

denotes transposition. In practice, the weighting matrix ''W'' is computed based on the available data set, which will be denoted as

\scriptstyle\hat

. Thus, the GMM estimator can be written as :

\hat\theta = \operatorname\min_ \bigg(\frac\sum_^T g(Y_t,\theta)\bigg)^ \hat \bigg(\frac\sum_^T g(Y_t,\theta)\bigg)

Under suitable conditions this estimator is

, asymptotically normal, and with right choice of weighting matrix

\scriptstyle\hat

also asymptotically efficient.

Properties

Consistency

Consistency In deductive logic, a consistent theory is one that does not lead to a logical contradiction. A theory T is consistent if there is no formula \varphi such that both \varphi and its negation \lnot\varphi are elements of the set of consequences ...

is a statistical property of an estimator stating that, having a sufficient number of observations, the estimator will converge in probability to the true value of parameter: :

\hat\theta \xrightarrow \theta_0\ \text\ T\to\infty.

Sufficient conditions for a GMM estimator to be consistent are as follows: #

\hat_T \xrightarrow W,

where ''W'' is a positive semi-definite matrix, #

\,W\operatorname,g(Y_t,\theta)\, 0

only for

\,\theta=\theta_0,

# The

space Space is a three-dimensional continuum containing positions and directions. In classical physics, physical space is often conceived in three linear dimensions. Modern physicists usually consider it, with time, to be part of a boundless ...

of possible parameters

\Theta \subset \mathbb^

compact Compact as used in politics may refer broadly to a pact or treaty; in more specific cases it may refer to: * Interstate compact, a type of agreement used by U.S. states * Blood compact, an ancient ritual of the Philippines * Compact government, a t ...

, #

\,g(Y,\theta)

is continuous at each ''θ'' with probability one, #

\infty.

The second condition here (so-called Global identification condition) is often particularly hard to verify. There exist simpler necessary but not sufficient conditions, which may be used to detect non-identification problem: * Order condition. The dimension of moment function ''m(θ)'' should be at least as large as the dimension of parameter vector ''θ''. * Local identification. If ''g(Y,θ)'' is continuously differentiable in a neighborhood of

\theta_0

, then matrix

W\operatorname nabla_\theta g(Y_t,\theta_0) /math> must have full column rank .

In practice applied econometricians often simply ''assume'' that global identification holds, without actually proving it.

Asymptotic normality

Asymptotic normality In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the limiting distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing appr ...

is a useful property, as it allows us to construct confidence bands for the estimator, and conduct different tests. Before we can make a statement about the asymptotic distribution of the GMM estimator, we need to define two auxiliary matrices: :

G = \operatorname,\nabla_\,g(Y_t,\theta_0)\, \qquad
        \Omega = \operatorname,g(Y_t,\theta_0)g(Y_t,\theta_0)^\, /math>
Then under conditions 1–6 listed below, the GMM estimator will be asymptotically normal with limiting distribution : \sqrt\big(\hat\theta - \theta_0\big)\ \xrightarrow\ \mathcal\big, (G^WG)^G^W\Omega W^G(G^W^G)^\big Conditions:
# \hat\theta is consistent (see previous section),
# The set of possible parameters \Theta \subset \mathbb^is

, #

\,g(Y,\theta)

is continuously differentiable in some neighborhood ''N'' of

\theta_0

with probability one, #

\infty,

\infty,

# the matrix

G'WG

is nonsingular.

Relative Efficiency

So far we have said nothing about the choice of matrix ''W'', except that it must be positive semi-definite. In fact any such matrix will produce a consistent and asymptotically normal GMM estimator, the only difference will be in the asymptotic variance of that estimator. It can be shown that taking :

W \propto\ \Omega^

will result in the most efficient estimator in the class of all (generalized) method of moment estimators. Only infinite number of orthogonal conditions obtains the smallest variance, the

Cramér–Rao bound In estimation theory and statistics, the Cramér–Rao bound (CRB) relates to estimation of a deterministic (fixed, though unknown) parameter. The result is named in honor of Harald Cramér and Calyampudi Radhakrishna Rao, but has also been d ...

. In this case the formula for the asymptotic distribution of the GMM estimator simplifies to :

\sqrt\big(\hat\theta - \theta_0\big)\ \xrightarrow\ \mathcal\big, (G^\,\Omega^G)^\big /math>

The proof that such a choice of weighting matrix is indeed locally optimal is often adopted with slight modifications when establishing efficiency of other estimators. As a rule of thumb, a weighting matrix inches closer to optimality when it turns into an expression closer to the

Implementation

One difficulty with implementing the outlined method is that we cannot take because, by the definition of matrix Ω, we need to know the value of ''θ''₀ in order to compute this matrix, and ''θ''₀ is precisely the quantity we do not know and are trying to estimate in the first place. In the case of ''Y''_t being iid we can estimate ''W'' as :

\hat_T(\hat\theta) = \bigg(\frac\sum_^T g(Y_t,\hat\theta)g(Y_t,\hat\theta)^\bigg)^.

Several approaches exist to deal with this issue, the first one being the most popular: Another important issue in implementation of minimization procedure is that the function is supposed to search through (possibly high-dimensional) parameter space ''Θ'' and find the value of ''θ'' which minimizes the objective function. No generic recommendation for such procedure exists, it is a subject of its own field, numerical optimization.

Sargan–Hansen ''J''-test

When the number of moment conditions is greater than the dimension of the parameter vector ''θ'', the model is said to be ''over-identified''. Sargan (1958) proposed tests for over-identifying restrictions based on instrumental variables estimators that are distributed in large samples as Chi-square variables with degrees of freedom that depend on the number of over-identifying restrictions. Subsequently, Hansen (1982) applied this test to the mathematically equivalent formulation of GMM estimators. Note, however, that such statistics can be negative in empirical applications where the models are misspecified, and likelihood ratio tests can yield insights since the models are estimated under both null and alternative hypotheses (Bhargava and Sargan, 1983). Conceptually we can check whether

\hat(\hat\theta)

is sufficiently close to zero to suggest that the model fits the data well. The GMM method has then replaced the problem of solving the equation

\hat(\theta)=0

, which chooses

\theta

to match the restrictions exactly, by a minimization calculation. The minimization can always be conducted even when no

\theta_0

exists such that

m(\theta_0)=0

. This is what J-test does. The J-test is also called a ''test for over-identifying restrictions''. Formally we consider two

hypotheses A hypothesis (: hypotheses) is a proposed explanation for a phenomenon. A scientific method, scientific hypothesis must be based on observations and make a testable and reproducible prediction about reality, in a process beginning with an educ ...

: *

H_0:\ m(\theta_0)=0

(the

null hypothesis The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...

that the model is “valid”), and *

H_1:\ m(\theta)\neq 0,\ \forall \theta\in\Theta

(the

alternative hypothesis In statistical hypothesis testing, the alternative hypothesis is one of the proposed propositions in the hypothesis test. In general the goal of hypothesis test is to demonstrate that in the given condition, there is sufficient evidence supporting ...

that model is “invalid”; the data does not come close to meeting the restrictions) Under hypothesis

H_0

, the following so-called J-statistic is asymptotically '' chi-squared'' distributed with ''k–l'' degrees of freedom. Define ''J'' to be: :

J \equiv T \cdot \bigg(\frac\sum_^T g(Y_t,\hat\theta)\bigg)^ \hat_T \bigg(\frac\sum_^T g(Y_t,\hat\theta)\bigg)\ \xrightarrow\ \chi^2_

under

H_0,

where

\hat\theta

is the GMM estimator of the parameter

\theta_0

, ''k'' is the number of moment conditions (dimension of vector ''g''), and ''l'' is the number of estimated parameters (dimension of vector ''θ''). Matrix

\hat_T

must converge in probability to

\Omega^

, the efficient weighting matrix (note that previously we only required that ''W'' be proportional to

\Omega^

for estimator to be efficient; however in order to conduct the J-test ''W'' must be exactly equal to

\Omega^

, not simply proportional). Under the alternative hypothesis

H_1

, the J-statistic is asymptotically unbounded: :

J\ \xrightarrow\ \infty

under

H_1

To conduct the test we compute the value of ''J'' from the data. It is a nonnegative number. We compare it with (for example) the 0.95

quantile In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities or dividing the observations in a sample in the same way. There is one fewer quantile t ...

of the

\chi^2_

distribution: *

H_0

is rejected at 95% confidence level if

J > q_^

H_0

cannot be rejected at 95% confidence level if

J < q_^

Scope

Many other popular estimation techniques can be cast in terms of GMM optimization:

An Alternative to the GMM

In method of moments, an alternative to the original (non-generalized) Method of Moments (MoM) is described, and references to some applications and a list of theoretical advantages and disadvantages relative to the traditional method are provided. This Bayesian-Like MoM (BL-MoM) is distinct from all the related methods described above, which are subsumed by the GMM. The literature does not contain a direct comparison between the GMM and the BL-MoM in specific applications.

Implementations

* R Programming wikibook, Method of Moments
RStata