Prediction Error
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
the mean squared prediction error (MSPE), also known as mean squared error of the predictions, of a
smoothing In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. In smoothing, the d ...
,
curve fitting Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is ...
, or regression procedure is the
expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
of the squared prediction errors (PE), the square difference between the fitted values implied by the predictive function \widehat and the values of the (unobservable)
true value The True Value Company is an American wholesaler and Hardware store brand. The corporate headquarters are located in Chicago. Historically True Value was a cooperative owned by retailers, but in 2018 it was purchased by ACON Investments. In Oc ...
''g''. It is an inverse measure of the ''explanatory power'' of \widehat, and can be used in the process of cross-validation of an estimated model. Knowledge of ''g'' would be required in order to calculate the MSPE exactly; in practice, MSPE is estimated.


Formulation

If the smoothing or fitting procedure has
projection matrix In statistics, the projection matrix (\mathbf), sometimes also called the influence matrix or hat matrix (\mathbf), maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). It describes ...
(i.e., hat matrix) ''L'', which maps the observed values vector y to
predicted value In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x' ...
s vector \hat=Ly, then PE and MSPE are formulated as: :\operatorname=g(x_i)-\widehat(x_i), :\operatorname=\operatorname\left operatorname_i^2\right\sum_^n \operatorname_i^2/n. The MSPE can be decomposed into two terms: the squared
bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...
(mean error) of the fitted values and the
variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
of the fitted values: :\operatorname=\operatorname^2 + \operatorname, :\operatorname=\operatorname\left \widehat(x_i) - g(x_i)\right/math> :\operatorname=\operatorname\left left(\widehat(x_i) - \operatorname\left[(x_i)\rightright)^2\right">x_i)\right.html" ;"title="left(\widehat(x_i) - \operatorname\left[(x_i)\right">left(\widehat(x_i) - \operatorname\left[(x_i)\rightright)^2\right The quantity is called sum squared prediction error. The root mean squared prediction error is the square root of MSPE: .


Computation of MSPE over out-of-sample data

The mean squared prediction error can be computed exactly in two contexts. First, with a data sample of length ''n'', the
sample (statistics)">data sample of length ''n'', the data analyst may run the regression analysis">regression over only ''q'' of the data points (with ''q'' < ''n''), holding back the other ''n – q'' data points with the specific purpose of using them to compute the estimated model’s MSPE out of sample (i.e., not using data that were used in the model estimation process). Since the regression process is tailored to the ''q'' in-sample points, normally the in-sample MSPE will be smaller than the out-of-sample one computed over the ''n – q'' held-back points. If the increase in the MSPE out of sample compared to in sample is relatively slight, that results in the model being viewed favorably. And if two models are to be compared, the one with the lower MSPE over the ''n – q'' out-of-sample data points is viewed more favorably, regardless of the models’ relative in-sample performances. The out-of-sample MSPE in this context is exact for the out-of-sample data points that it was computed over, but is merely an estimate of the model’s MSPE for the mostly unobserved population from which the data were drawn. Second, as time goes on more data may become available to the data analyst, and then the MSPE can be computed over these new data.


Estimation of MSPE over the population

When the model has been estimated over all available data with none held back, the MSPE of the model over the entire statistical population">population Population is a set of humans or other organisms in a given region or area. Governments conduct a census to quantify the resident population size within a given jurisdiction. The term is also applied to non-human animals, microorganisms, and pl ...
of mostly unobserved data can be estimated as follows. For the model y_i=g(x_i)+\sigma\varepsilon_i where \varepsilon_i\sim\mathcal(0,1), one may write :n\cdot\operatorname(L)=g^(I-L)^(I-L)g+\sigma^2\operatorname\left[L^ L\right]. Using in-sample data values, the first term on the right side is equivalent to :\sum_^n\left(\operatorname\left[g(x_i)-\widehat(x_i)\right]\right)^2 =\operatorname\left sum_^n\left(y_i-\widehat(x_i)\right)^2\right\sigma^2\operatorname\left left(I-L\right)^T\left(I-L\right)\right Thus, :n\cdot\operatorname(L)=\operatorname\left sum_^n\left(y_i-\widehat(x_i)\right)^2\right\sigma^2\left(n-\operatorname\left \rightright). If \sigma^2 is known or well-estimated by \widehat^2, it becomes possible to estimate MSPE by :n\cdot\operatorname(L)=\sum_^n\left(y_i-\widehat(x_i)\right)^2-\widehat^2\left(n-\operatorname\left \rightright). Colin Mallows advocated this method in the construction of his model selection statistic ''Cp'', which is a normalized version of the estimated MSPE: :C_p=\frac-n+2p. where ''p'' the number of estimated parameters ''p'' and \widehat^2 is computed from the version of the model that includes all possible regressors. That concludes this proof.


See also

*
Akaike information criterion The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to ...
* Bias-variance tradeoff *
Mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwee ...
*
Errors and residuals in statistics In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "true value" (not necessarily observable). The erro ...
*
Law of total variance The law of total variance is a fundamental result in probability theory that expresses the variance of a random variable in terms of its conditional variances and conditional means given another random variable . Informally, it states that the o ...
* Mallows's ''Cp'' *
Model selection Model selection is the task of selecting a model from among various candidates on the basis of performance criterion to choose the best one. In the context of machine learning and more generally statistical analysis, this may be the selection of ...


References

{{DEFAULTSORT:Mean Squared Prediction Error Point estimation performance Statistical deviation and dispersion Loss functions