statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

the mean squared prediction error (MSPE), also known as mean squared error of the predictions, of a

smoothing In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. In smoothing, the d ...

curve fitting Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is ...

, or regression procedure is the

expected value In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...

of the squared prediction errors (PE), the square difference between the fitted values implied by the predictive function

\widehat

and the values of the (unobservable)

true value The True Value Company is an American wholesaler and Hardware store brand. The corporate headquarters are located in Chicago. Historically True Value was a cooperative owned by retailers, but in 2018 it was purchased by ACON Investments. In Oc ...

''g''. It is an inverse measure of the ''explanatory power'' of

\widehat,

and can be used in the process of cross-validation of an estimated model. Knowledge of ''g'' would be required in order to calculate the MSPE exactly; in practice, MSPE is estimated.

Formulation

If the smoothing or fitting procedure has

projection matrix In statistics, the projection matrix (\mathbf), sometimes also called the influence matrix or hat matrix (\mathbf), maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). It describes ...

(i.e., hat matrix) ''L'', which maps the observed values vector

y

predicted value In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x' ...

s vector

\hat=Ly,

then PE and MSPE are formulated as: :

\operatorname=g(x_i)-\widehat(x_i),

\sum_^n \operatorname_i^2/n.

The MSPE can be decomposed into two terms: the squared

bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...

(mean error) of the fitted values and the

variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...

of the fitted values: :

\operatorname=\operatorname^2 + \operatorname,

\operatorname=\operatorname\left \widehat(x_i) - g(x_i)\right /math>
: \operatorname=\operatorname\left left(\widehat(x_i) - \operatorname\left[(x_i)\right right)^2\right">x_i)\right.html" ;"title="left(\widehat(x_i) - \operatorname\left[(x_i)\right">left(\widehat(x_i) - \operatorname\left[(x_i)\rightright)^2\right The quantity  is called sum squared prediction error.
The root mean squared prediction error is the square root of MSPE: .

Computation of MSPE over out-of-sample data

The mean squared prediction error can be computed exactly in two contexts. First, with a data sample of length ''n'', the

sample (statistics)">data sample of length ''n'', the data analyst may run the regression analysis">regression over only ''q'' of the data points (with ''q'' < ''n''), holding back the other ''n – q'' data points with the specific purpose of using them to compute the estimated model’s MSPE out of sample (i.e., not using data that were used in the model estimation process). Since the regression process is tailored to the ''q'' in-sample points, normally the in-sample MSPE will be smaller than the out-of-sample one computed over the ''n – q'' held-back points. If the increase in the MSPE out of sample compared to in sample is relatively slight, that results in the model being viewed favorably. And if two models are to be compared, the one with the lower MSPE over the ''n – q'' out-of-sample data points is viewed more favorably, regardless of the models’ relative in-sample performances. The out-of-sample MSPE in this context is exact for the out-of-sample data points that it was computed over, but is merely an estimate of the model’s MSPE for the mostly unobserved population from which the data were drawn. Second, as time goes on more data may become available to the data analyst, and then the MSPE can be computed over these new data.

Estimation of MSPE over the population

When the model has been estimated over all available data with none held back, the MSPE of the model over the entire statistical population">population Population is a set of humans or other organisms in a given region or area. Governments conduct a census to quantify the resident population size within a given jurisdiction. The term is also applied to non-human animals, microorganisms, and pl ...

of mostly unobserved data can be estimated as follows. For the model

y_i=g(x_i)+\sigma\varepsilon_i

where

\varepsilon_i\sim\mathcal(0,1)

, one may write :

n\cdot\operatorname(L)=g^(I-L)^(I-L)g+\sigma^2\operatorname\left[L^ L\right].

Using in-sample data values, the first term on the right side is equivalent to :

\sum_^n\left(\operatorname\left[g(x_i)-\widehat(x_i)\right]\right)^2
=\operatorname\left sum_^n\left(y_i-\widehat(x_i)\right)^2\right \sigma^2\operatorname\left left(I-L\right)^T\left(I-L\right)\right

Thus, :

n\cdot\operatorname(L)=\operatorname\left sum_^n\left(y_i-\widehat(x_i)\right)^2\right \sigma^2\left(n-\operatorname\left \right right).

\sigma^2

is known or well-estimated by

\widehat^2

, it becomes possible to estimate MSPE by :

\right

Colin Mallows advocated this method in the construction of his model selection statistic ''C_p'', which is a normalized version of the estimated MSPE: :

C_p=\frac-n+2p.

where ''p'' the number of estimated parameters ''p'' and

\widehat^2

is computed from the version of the model that includes all possible regressors. That concludes this proof.

References

{{DEFAULTSORT:Mean Squared Prediction Error Point estimation performance Statistical deviation and dispersion Loss functions

Formulation

Computation of MSPE over out-of-sample data

Estimation of MSPE over the population

See also

References