HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the residual sum of squares (RSS), also known as the sum of squared residuals (SSR) or the sum of squared estimate of errors (SSE), is the sum of the
squares In geometry, a square is a regular polygon, regular quadrilateral. It has four straight sides of equal length and four equal angles. Squares are special cases of rectangles, which have four equal angles, and of rhombuses, which have four equal si ...
of residuals (deviations predicted from actual empirical values of data). It is a measure of the discrepancy between the data and an estimation model, such as a
linear regression In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...
. A small RSS indicates a tight fit of the model to the data. It is used as an
optimality criterion Optimality may refer to: * Mathematical optimization * Optimality theory Optimality theory (frequently abbreviated OT) is a linguistic model proposing that the observed forms of language arise from the optimal satisfaction of conflicting const ...
in parameter selection and
model selection Model selection is the task of selecting a model from among various candidates on the basis of performance criterion to choose the best one. In the context of machine learning and more generally statistical analysis, this may be the selection of ...
. In general,
total sum of squares In statistical data analysis the total sum of squares (TSS or SST) is a quantity that appears as part of a standard way of presenting results of such analyses. For a set of observations, y_i, i\leq n, it is defined as the sum over all squared dif ...
=
explained sum of squares In statistics, the explained sum of squares (ESS), alternatively known as the model sum of squares or sum of squares due to regression (SSR – not to be confused with the residual sum of squares (RSS) or sum of squares of errors), is a quantity ...
+ residual sum of squares. For a proof of this in the multivariate
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression In statistics, linear regression is a statistical model, model that estimates the relationship ...
(OLS) case, see partitioning in the general OLS model.


One explanatory variable

In a model with a single explanatory variable, RSS is given by: :\operatorname = \sum_^n (y_i - f(x_i))^2 where ''y''''i'' is the ''i''th value of the variable to be predicted, ''x''''i'' is the ''i''th value of the explanatory variable, and f(x_i) is the predicted value of ''y''''i'' (also termed \hat). In a standard linear simple regression model, y_i = \alpha + \beta x_i+\varepsilon_i\,, where \alpha and \beta are
coefficient In mathematics, a coefficient is a Factor (arithmetic), multiplicative factor involved in some Summand, term of a polynomial, a series (mathematics), series, or any other type of expression (mathematics), expression. It may be a Dimensionless qu ...
s, ''y'' and ''x'' are the regressand and the
regressor A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical function ...
, respectively, and ε is the
error term In mathematics and statistics, an error term is an additive type of error. In writing, an error term is an instance of faulty language or grammar. Common examples include: * errors and residuals in statistics, e.g. in linear regression * the error ...
. The sum of squares of residuals is the sum of squares of \widehat_i; that is :\operatorname = \sum_^n (\widehat_i)^2 = \sum_^n (y_i - (\widehat + \widehat x_i))^2 where \widehat is the estimated value of the constant term \alpha and \widehat is the estimated value of the slope coefficient \beta.


Matrix expression for the OLS residual sum of squares

The general regression model with observations and explanators, the first of which is a constant unit vector whose coefficient is the regression intercept, is : y = X \beta + e where is an ''n'' × 1 vector of dependent variable observations, each column of the ''n'' × ''k'' matrix is a vector of observations on one of the ''k'' explanators, \beta is a ''k'' × 1 vector of true coefficients, and is an ''n''× 1 vector of the true underlying errors. The
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression In statistics, linear regression is a statistical model, model that estimates the relationship ...
estimator for \beta is : X \hat \beta = y \iff : X^\operatorname X \hat \beta = X^\operatorname y \iff : \hat \beta = (X^\operatorname X)^X^\operatorname y. The residual vector \hat e = y - X \hat \beta = y - X (X^\operatorname X)^X^\operatorname y; so the residual sum of squares is: :\operatorname = \hat e ^\operatorname \hat e = \, \hat e \, ^2 , (equivalent to the square of the
norm Norm, the Norm or NORM may refer to: In academic disciplines * Normativity, phenomenon of designating things as good or bad * Norm (geology), an estimate of the idealised mineral content of a rock * Norm (philosophy), a standard in normative e ...
of residuals). In full: :\operatorname = y^\operatorname y - y^\operatorname X(X^\operatorname X)^ X^\operatorname y = y^\operatorname - X(X^\operatorname X)^ X^\operatornamey = y^\operatorname - Hy, where is the
hat matrix In statistics, the projection matrix (\mathbf), sometimes also called the influence matrix or hat matrix (\mathbf), maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). It describes ...
, or the projection matrix in linear regression.


Relation with Pearson's product-moment correlation

The least-squares regression line is given by :y=ax+b, where b=\bar-a\bar and a=\frac, where S_=\sum_^n(\bar-x_i)(\bar-y_i) and S_=\sum_^n(\bar-x_i)^2. Therefore, : \begin \operatorname & = \sum_^n (y_i - f(x_i))^2= \sum_^n (y_i - (ax_i+b))^2= \sum_^n (y_i - ax_i-\bar + a\bar)^2 \\ pt& = \sum_^n (a(\bar-x_i)-(\bar-y_i))^2=a^2S_-2aS_+S_=S_-aS_=S_ \left(1-\frac \right) \end where S_=\sum_^n (\bar-y_i)^2 . The Pearson product-moment correlation is given by r=\frac; therefore, \operatorname=S_(1-r^2).


See also

* Akaike information criterion#Comparison with least squares * Chi-squared distribution#Applications * Degrees of freedom (statistics)#Sum of squares and degrees of freedom *
Errors and residuals in statistics In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "true value" (not necessarily observable). The erro ...
*
Lack-of-fit sum of squares In statistics, a sum of squares due to lack of fit, or more tersely a lack-of-fit sum of squares, is one of the components of a partition of the sum of squares of residuals in an analysis of variance, used in the numerator in an F-test of the null ...
*
Mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwee ...
* Reduced chi-squared statistic, RSS per degree of freedom *
Squared deviations A square is a regular quadrilateral with four equal sides and four right angles. Square or Squares may also refer to: Mathematics and science *Square (algebra), multiplying a number or expression by itself *Square (cipher), a cryptographic block ...
* Sum of squares (statistics)


References

* {{cite book , title = Applied Regression Analysis , edition = 3rd , last1= Draper , first1=N.R. , last2=Smith , first2=H. , publisher = John Wiley , year = 1998 , isbn = 0-471-17082-8 Least squares Errors and residuals