In
statistics, Mallows's ''C
p'', named for
Colin Lingwood Mallows, is used to assess the
fit of a
regression model that has been estimated using
ordinary least squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the ...
. It is applied in the context of
model selection, where a number of
predictor variables are available for predicting some outcome, and the goal is to find the best model involving a subset of these predictors. A small value of C
p means that the model is relatively precise.
Mallows's ''C
p'' has been shown to be equivalent to
Akaike information criterion
The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to ...
in the special case of Gaussian
linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is ...
.
Definition and properties
Mallows's ''C
p'' addresses the issue of
overfitting
mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitt ...
, in which model selection statistics such as the residual sum of squares always get smaller as more variables are added to a model. Thus, if we aim to select the model giving the smallest residual sum of squares, the model including all variables would always be selected. Instead, the ''C
p'' statistic calculated on a
sample of data estimates the
sum squared prediction error (SSPE) as its
population
Population typically refers to the number of people in a single area, whether it be a city or to