In
statistics, generalized least squares (GLS) is a technique for estimating the unknown
parameter
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
s in a
linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is ...
model when there is a certain degree of
correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statisti ...
between the
residuals in a
regression model. In these cases,
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
and
weighted least squares
Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression.
WLS is also a speci ...
can be statistically
inefficient, or even give misleading
inferences. GLS was first described by
Alexander Aitken in 1936.
Method outline
In standard
linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is ...
models we observe data
on ''n''
statistical units. The response values are placed in a vector
, and the predictor values are placed in the
design matrix
In statistics and in particular in regression analysis, a design matrix, also known as model matrix or regressor matrix and often denoted by X, is a matrix of values of explanatory variables of a set of objects. Each row represents an individual ...
, where
is a vector of the ''k'' predictor variables (including a constant) for the ''i''th unit. The model forces the
conditional mean
In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – given ...
of
given
to be a linear function of
, and assumes the conditional
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
of the error term given
is a ''known'' nonsingular ''
covariance matrix''
. This is usually written as
:
Here
is a vector of unknown constants (known as “regression coefficients”) that must be estimated from the data.
Suppose
is a candidate estimate for
. Then the
residual vector for
will be
. The generalized least squares method estimates
by minimizing the squared
Mahalanobis length of this residual vector:
:
where the last two terms evaluate to scalars, resulting in
:
This objective is a
quadratic form
In mathematics, a quadratic form is a polynomial with terms all of degree two ("form" is another name for a homogeneous polynomial). For example,
:4x^2 + 2xy - 3y^2
is a quadratic form in the variables and . The coefficients usually belong to ...
in
.
Taking the gradient of this quadratic form with respect to
and equating it to zero (when
) gives
:
Therefore, the minimum of the objective function can be computed yielding the explicit formula:
:
The quantity
is known as the ''
precision matrix'' (or ''dispersion matrix''), a generalization of the diagonal
weight matrix.
Properties
The GLS estimator is
unbiased,
consistent
In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consisten ...
,
efficient, and
asymptotically normal with
and
. GLS is equivalent to applying ordinary least squares to a linearly transformed version of the data. To see this, factor
, for instance using the
Cholesky decomposition. Then if we pre-multiply both sides of the equation
by
, we get an equivalent linear model
where
,
, and
. In this model
, where
is the
identity matrix
In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere.
Terminology and notation
The identity matrix is often denoted by I_n, or simply by I if the size is immaterial ...
. Thus we can efficiently estimate
by applying
Ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
(OLS) to the transformed data, which requires minimizing
:
This has the effect of standardizing the scale of the errors and “de-correlating” them. Since OLS is applied to data with homoscedastic errors, the
Gauss–Markov theorem applies, and therefore the GLS estimate is the
best linear unbiased estimator for ''β''.
Weighted least squares
A special case of GLS called weighted least squares (WLS) occurs when all the off-diagonal entries of ''Ω'' are 0. This situation arises when the variances of the observed values are unequal (i.e.
heteroscedasticity is present), but where no correlations exist among the observed variances. The weight for unit ''i'' is proportional to the reciprocal of the variance of the response for unit ''i''.
Feasible generalized least squares
If the covariance of the errors
is unknown, one can get a consistent estimate of
, say
,
[Baltagi, B. H. (2008). Econometrics (4th ed.). New York: Springer.] using an implementable version of GLS known as the feasible generalized least squares (FGLS) estimator. In FGLS, modeling proceeds in two stages: (1) the model is estimated by OLS or another consistent (but inefficient) estimator, and the residuals are used to build a consistent estimator of the errors covariance matrix (to do so, one often needs to examine the model adding additional constraints, for example if the errors follow a time series process, a statistician generally needs some theoretical assumptions on this process to ensure that a consistent estimator is available); and (2) using the consistent estimator of the covariance matrix of the errors, one can implement GLS ideas.
Whereas GLS is more efficient than OLS under
heteroscedasticity (also spelled heteroskedasticity) or
autocorrelation
Autocorrelation, sometimes known as serial correlation in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations of a random variable ...
, this is not true for FGLS. The feasible estimator is, provided the errors covariance matrix is consistently estimated, ''asymptotically'' more efficient, but for a small or medium size sample, it can be actually less efficient than OLS. This is why some authors prefer to use OLS, and reformulate their inferences by simply considering an alternative estimator for the variance of the estimator robust to heteroscedasticity or serial autocorrelation.
But for large samples FGLS is preferred over OLS under heteroskedasticity or serial correlation.
[Greene, W. H. (2003). Econometric Analysis (5th ed.). Upper Saddle River, NJ: Prentice Hall.] A cautionary note is that the FGLS estimator is not always consistent. One case in which FGLS might be inconsistent is if there are individual specific fixed effects.
In general this estimator has different properties than GLS. For large samples (i.e., asymptotically) all properties are (under appropriate conditions) common with respect to GLS, but for finite samples the properties of FGLS estimators are unknown: they vary dramatically with each particular model, and as a general rule their exact distributions cannot be derived analytically. For finite samples, FGLS may be even less efficient than OLS in some cases. Thus, while GLS can be made feasible, it is not always wise to apply this method when the sample is small.
A method sometimes used to improve the accuracy of the estimators in finite samples is to iterate, i.e. taking the residuals from FGLS to update the errors covariance estimator, and then updating the FGLS estimation, applying the same idea iteratively until the estimators vary less than some tolerance. But this method does not necessarily improve the efficiency of the estimator very much if the original sample was small.
A reasonable option when samples are not too large is to apply OLS, but throwing away the classical variance estimator
:
(which is inconsistent in this framework) and using a HAC (Heteroskedasticity and Autocorrelation Consistent) estimator. For example, in autocorrelation context we can use the Bartlett estimator (often known as
Newey–West estimator estimator since these authors popularized the use of this estimator among econometricians in their 1987 ''Econometrica'' article), and in heteroskedastic context we can use the
Eicker–White estimator. This approach is much safer, and it is the appropriate path to take unless the sample is large, and "large" is sometimes a slippery issue (e.g. if the errors distribution is asymmetric the required sample would be much larger).
The
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
(OLS) estimator is calculated as usual by
:
and estimates of the residuals
are constructed.
For simplicity consider the model for heteroscedastic and not autocorrelated errors. Assume that the variance-covariance matrix
of the error vector is diagonal, or equivalently that errors from distinct observations are uncorrelated. Then each diagonal entry may be estimated by the fitted residuals
so
may be constructed by
:
It is important to notice that the squared residuals cannot be used in the previous expression; we need an estimator of the errors variances. To do so, we can use a parametric heteroskedasticity model, or a nonparametric estimator. Once this step is fulfilled, we can proceed:
Estimate
using
using
weighted least squares
Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression.
WLS is also a speci ...
:
The procedure can be iterated. The first iteration is given by
:
:
:
This estimation of
can be iterated to convergence.
Under regularity conditions any of the FGLS estimator (or that of any of its iterations, if we iterate a finite number of times) is asymptotically distributed as
:
where n is the sample size and
:
here p-lim means limit in probability
See also
*
Confidence region In statistics, a confidence region is a multi-dimensional generalization of a confidence interval. It is a set of points in an ''n''-dimensional space, often represented as an ellipsoid around a point which is an estimated solution to a problem, al ...
*
Effective degrees of freedom
*
Prais–Winsten estimation
References
Further reading
*
*
*
*
{{DEFAULTSORT:Generalized Least Squares
Least squares
Estimation methods
Regression with time series structure