Linear least squares (LLS) is the
least squares approximation
The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...
of
linear functions to data.
It is a set of formulations for solving statistical problems involved in
linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
, including variants for
ordinary
Ordinary or The Ordinary often refer to:
Music
* ''Ordinary'' (EP) (2015), by South Korean group Beast
* ''Ordinary'' (Every Little Thing album) (2011)
* "Ordinary" (Two Door Cinema Club song) (2016)
* "Ordinary" (Wayne Brady song) (2008)
* ...
(unweighted),
weighted
A weight function is a mathematical device used when performing a sum, integral, or average to give some elements more "weight" or influence on the result than other elements in the same set. The result of this application of a weight function is ...
, and
generalized (correlated)
residuals.
Numerical methods for linear least squares Numerical methods for linear least squares entails the numerical analysis of linear least squares problems.
Introduction
A general approach to the least squares problem \operatorname \, \big\, \mathbf y - X \boldsymbol \beta \big\, ^2 can be descri ...
include inverting the matrix of the normal equations and
orthogonal decomposition methods.
Main formulations
The three main linear least squares formulations are:
*
Ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
(OLS) is the most common estimator. OLS estimates are commonly used to analyze both
experimental and
observational
Observation is the active acquisition of information from a primary source. In living beings, observation employs the senses. In science, observation can also involve the perception and recording of data via the use of scientific instruments. The ...
data. The OLS method minimizes the sum of squared
residuals, and leads to a closed-form expression for the estimated value of the unknown parameter vector ''β'':
where
is a vector whose ''i''th element is the ''i''th observation of the
dependent variable
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
, and
is a matrix whose ''ij'' element is the ''i''th observation of the ''j''th
independent variable
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
. The estimator is
unbiased and
consistent if the errors have finite variance and are uncorrelated with the regressors:
where
is the transpose of row ''i'' of the matrix
It is also
efficient under the assumption that the errors have finite variance and are
homoscedastic, meaning that E
''i''2x''i''">'ε''''i''2x''i''does not depend on ''i''. The condition that the errors are uncorrelated with the regressors will generally be satisfied in an experiment, but in the case of observational data, it is difficult to exclude the possibility of an omitted covariate ''z'' that is related to both the observed covariates and the response variable. The existence of such a covariate will generally lead to a correlation between the regressors and the response variable, and hence to an inconsistent estimator of β. The condition of homoscedasticity can fail with either experimental or observational data. If the goal is either inference or predictive modeling, the performance of OLS estimates can be poor if
multicollinearity is present, unless the sample size is large.
*
Weighted least squares (WLS) are used when
heteroscedasticity
In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The s ...
is present in the error terms of the model.
*
Generalized least squares (GLS) is an extension of the OLS method, that allows efficient estimation of ''β'' when either
heteroscedasticity
In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The s ...
, or correlations, or both are present among the error terms of the model, as long as the form of heteroscedasticity and correlation is known independently of the data. To handle heteroscedasticity when the error terms are uncorrelated with each other, GLS minimizes a weighted analogue to the sum of squared residuals from OLS regression, where the weight for the ''i''
th case is inversely proportional to var(''ε''
''i''). This special case of GLS is called "weighted least squares". The GLS solution to an estimation problem is
where Ω is the covariance matrix of the errors. GLS can be viewed as applying a linear transformation to the data so that the assumptions of OLS are met for the transformed data. For GLS to be applied, the covariance structure of the errors must be known up to a multiplicative constant.
Alternative formulations
Other formulations include:
*
Iteratively reweighted least squares
The method of iteratively reweighted least squares (IRLS) is used to solve certain optimization problems with objective functions of the form of a ''p''-norm:
:\underset \sum_^n \big, y_i - f_i (\boldsymbol\beta) \big, ^p,
by an iterative met ...
(IRLS) is used when
heteroscedasticity
In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The s ...
, or correlations, or both are present among the error terms of the model, but where little is known about the covariance structure of the errors independently of the data. In the first iteration, OLS, or GLS with a provisional covariance structure is carried out, and the residuals are obtained from the fit. Based on the residuals, an improved estimate of the covariance structure of the errors can usually be obtained. A subsequent GLS iteration is then performed using this estimate of the error structure to define the weights. The process can be iterated to convergence, but in many cases, only one iteration is sufficient to achieve an efficient estimate of ''β''.
*
Instrumental variables
In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to ...
regression (IV) can be performed when the regressors are correlated with the errors. In this case, we need the existence of some auxiliary ''instrumental variables'' z
''i'' such that E
''i''''ε''''i''">''z''i''''ε''''i''nbsp;= 0. If Z is the matrix of instruments, then the estimator can be given in closed form as
Optimal instruments regression is an extension of classical IV regression to the situation where .
*
Total least squares
In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generaliza ...
(TLS) is an approach to least squares estimation of the linear regression model that treats the covariates and response variable in a more geometrically symmetric manner than OLS. It is one approach to handling the "errors in variables" problem, and is also sometimes used even when the covariates are assumed to be error-free.
*Percentage least squares focuses on reducing percentage errors, which is useful in the field of forecasting or time series analysis. It is also useful in situations where the dependent variable has a wide range without constant variance, as here the larger residuals at the upper end of the range would dominate if OLS were used. When the percentage or relative error is normally distributed, least squares percentage regression provides maximum likelihood estimates. Percentage regression is linked to a multiplicative error model, whereas OLS is linked to models containing an additive error term.
*
Constrained least squares, indicates a linear least squares problem with additional constraints on the solution.
Objective function
In OLS (i.e., assuming unweighted observations), the
optimal value of the
objective function is found by substituting the optimal expression for the coefficient vector:
where
, the latter equality holding since
is symmetric and idempotent. It can be shown from this that under an appropriate assignment of weights the
expected value
In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
of ''S'' is ''m'' − ''n''. If instead unit weights are assumed, the expected value of ''S'' is
, where
is the variance of each observation.
If it is assumed that the residuals belong to a normal distribution, the objective function, being a sum of weighted squared residuals, will belong to a
chi-squared distribution with ''m'' − ''n''
degrees of freedom
Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
. Some illustrative percentile values of
are given in the following table.
These values can be used for a statistical criterion as to the
goodness of fit. When unit weights are used, the numbers should be divided by the variance of an observation.
For WLS, the ordinary objective function above is replaced for a weighted average of residuals.
Discussion
In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
and
mathematics
Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
, linear least squares is an approach to fitting a
mathematical or
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
to
data in cases where the idealized value provided by the model for any data point is expressed linearly in terms of the unknown
parameters of the model. The resulting fitted model can be used to
summarize the data, to
predict
A prediction (Latin ''præ-'', "before," and ''dicere'', "to say"), or forecast, is a statement about a future event or data. They are often, but not always, based upon experience or knowledge. There is no universal agreement about the exact ...
unobserved values from the same system, and to understand the mechanisms that may underlie the system.
Mathematically, linear least squares is the problem of approximately solving an
overdetermined system of linear equations A x = b, where b is not an element of the
column space of the matrix A. The approximate solution is realized as an exact solution to A x = b', where b' is the projection of b onto the column space of A. The best approximation is then that which minimizes the sum of squared differences between the data values and their corresponding modeled values. The approach is called ''linear'' least squares since the assumed function is linear in the parameters to be estimated. Linear least squares problems are
convex and have a
closed-form solution that is unique, provided that the number of data points used for fitting equals or exceeds the number of unknown parameters, except in special degenerate situations. In contrast,
non-linear least squares problems generally must be solved by an
iterative procedure, and the problems can be non-convex with multiple optima for the objective function. If prior distributions are available, then even an underdetermined system can be solved using the
Bayesian MMSE estimator.
In statistics, linear least squares problems correspond to a particularly important type of
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
called
linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
which arises as a particular form of
regression analysis. One basic form of such a model is an
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
model. The present article concentrates on the mathematical aspects of linear least squares problems, with discussion of the formulation and interpretation of statistical regression models and
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...
s related to these being dealt with in the articles just mentioned. See
outline of regression analysis
The following outline is provided as an overview of and topical guide to regression analysis:
Regression analysis – use of statistical techniques for learning about the relationship between one or more dependent variables (''Y'') and one ...
for an outline of the topic.
Properties
If the experimental errors,
, are uncorrelated, have a mean of zero and a constant variance,
, the
Gauss–Markov theorem states that the least-squares estimator,
, has the minimum variance of all estimators that are linear combinations of the observations. In this sense it is the best, or optimal, estimator of the parameters. Note particularly that this property is independent of the statistical
distribution function of the errors. In other words, ''the distribution function of the errors need not be a
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
''. However, for some probability distributions, there is no guarantee that the least-squares solution is even possible given the observations; still, in such cases it is the best estimator that is both linear and unbiased.
For example, it is easy to show that the
arithmetic mean
In mathematics and statistics, the arithmetic mean ( ) or arithmetic average, or just the ''mean'' or the ''average'' (when the context is clear), is the sum of a collection of numbers divided by the count of numbers in the collection. The colle ...
of a set of measurements of a quantity is the least-squares estimator of the value of that quantity. If the conditions of the Gauss–Markov theorem apply, the arithmetic mean is optimal, whatever the distribution of errors of the measurements might be.
However, in the case that the experimental errors do belong to a normal distribution, the least-squares estimator is also a
maximum likelihood estimator.
These properties underpin the use of the method of least squares for all types of data fitting, even when the assumptions are not strictly valid.
Limitations
An assumption underlying the treatment given above is that the independent variable, ''x'', is free of error. In practice, the errors on the measurements of the independent variable are usually much smaller than the errors on the dependent variable and can therefore be ignored. When this is not the case,
total least squares
In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generaliza ...
or more generally
errors-in-variables models, or ''rigorous least squares'', should be used. This can be done by adjusting the weighting scheme to take into account errors on both the dependent and independent variables and then following the standard procedure.
In some cases the (weighted) normal equations matrix ''X''
T''X'' is
ill-conditioned. When fitting polynomials the normal equations matrix is a
Vandermonde matrix. Vandermonde matrices become increasingly ill-conditioned as the order of the matrix increases. In these cases, the least squares estimate amplifies the measurement noise and may be grossly inaccurate. Various
regularization
Regularization may refer to:
* Regularization (linguistics)
* Regularization (mathematics)
* Regularization (physics)
* Regularization (solid modeling)
* Regularization Law, an Israeli law intended to retroactively legalize settlements
See also ...
techniques can be applied in such cases, the most common of which is called
ridge regression. If further information about the parameters is known, for example, a range of possible values of
, then various techniques can be used to increase the stability of the solution. For example, see
constrained least squares.
Another drawback of the least squares estimator is the fact that the norm of the residuals,
is minimized, whereas in some cases one is truly interested in obtaining small error in the parameter
, e.g., a small value of
. However, since the true parameter
is necessarily unknown, this quantity cannot be directly minimized. If a
prior probability on
is known, then a
Bayes estimator
In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function (i.e., the posterior expected loss). Equivalently, it maximizes the po ...
can be used to minimize the
mean squared error,
. The least squares method is often applied when no prior is known. Surprisingly, when several parameters are being estimated jointly, better estimators can be constructed, an effect known as
Stein's phenomenon
In decision theory and estimation theory, Stein's example (also known as Stein's phenomenon or Stein's paradox) is the observation that when three or more parameters are estimated simultaneously, there exist combined estimators more accurate on av ...
. For example, if the measurement error is
Gaussian
Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below.
There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...
, several estimators are known which
dominate, or outperform, the least squares technique; the best known of these is the
James–Stein estimator. This is an example of more general
shrinkage estimator
In statistics, shrinkage is the reduction in the effects of sampling variation. In regression analysis, a fitted relationship appears to perform less well on a new data set than on the data set used for fitting. In particular the value of the coeff ...
s that have been applied to regression problems.
Applications
*
Polynomial fitting: models are
polynomials in an independent variable, ''x'':
** Straight line:
.
** Quadratic:
.
** Cubic, quartic and higher polynomials. For
regression with high-order polynomials, the use of
orthogonal polynomials is recommended.
*
Numerical smoothing and differentiation — this is an application of polynomial fitting.
* Multinomials in more than one independent variable, including surface fitting
* Curve fitting with
B-spline
In the mathematical subfield of numerical analysis, a B-spline or basis spline is a spline function that has minimal support with respect to a given degree, smoothness, and domain partition. Any spline function of given degree can be expresse ...
s
[
* Chemometrics, Calibration curve, ]Standard addition
The method of standard addition is a type of quantitative analysis approach often used in analytical chemistry whereby the standard is added directly to the aliquots of analyzed sample. This method is used in situations where sample matrix also ...
, Gran plot, analysis of mixtures
Uses in data fitting
The primary application of linear least squares is in data fitting
Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is ...
. Given a set of ''m'' data points consisting of experimentally measured values taken at ''m'' values of an independent variable ( may be scalar or vector quantities), and given a model function with it is desired to find the parameters such that the model function "best" fits the data. In linear least squares, linearity is meant to be with respect to parameters so
Here, the functions may be nonlinear with respect to the variable x.
Ideally, the model function fits the data exactly, so
for all This is usually not possible in practice, as there are more data points than there are parameters to be determined. The approach chosen then is to find the minimal possible value of the sum of squares of the residuals
so to minimize the function
After substituting for and then for , this minimization problem becomes the quadratic minimization problem above with
and the best fit can be found by solving the normal equations.
Example
As a result of an experiment, four data points were obtained, and (shown in red in the diagram on the right). We hope to find a line that best fits these four points. In other words, we would like to find the numbers and that approximately solve the overdetermined linear system:
of four equations in two unknowns in some "best" sense.
represents the residual, at each point, between the curve fit and the data:
The least squares
The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...
approach to solving this problem is to try to make the sum of the squares of these residuals as small as possible; that is, to find the minimum of the function:
The minimum is determined by calculating the partial derivative
In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). Part ...
s of with respect to and and setting them to zero:
This results in a system of two equations in two unknowns, called the normal equations, which when solved give:
and the equation is the line of best fit. The residuals, that is, the differences between the values from the observations and the predicated variables by using the line of best fit, are then found to be and (see the diagram on the right). The minimum value of the sum of squares of the residuals is
More generally, one can have regressors , and a linear model
Using a quadratic model
Importantly, in "linear least squares", we are not restricted to using a line as the model as in the above example. For instance, we could have chosen the restricted quadratic model . This model is still linear in the parameter, so we can still perform the same analysis, constructing a system of equations from the data points:
The partial derivatives with respect to the parameters (this time there is only one) are again computed and set to 0:
and solved
leading to the resulting best fit model
See also
* Line-line intersection#Nearest point to non-intersecting lines, an application
* Line fitting
* Nonlinear least squares
* Regularized least squares
* Simple linear regression
In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x'' and ...
* Partial least squares regression
* Linear function
References
Further reading
*
External links
Least Squares Fitting – From MathWorld
{{Least Squares and Regression Analysis
Broad-concept articles
Least squares
Computational statistics