In statistics, shrinkage is the reduction in the effects of sampling variation. In

regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...

, a fitted relationship appears to perform less well on a new data set than on the data set used for fitting. In particular the value of the coefficient of determination 'shrinks'. This idea is complementary to

overfitting mathematical modeling, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit to additional data or predict future observations reliably". An overfitt ...

and, separately, to the standard adjustment made in the coefficient of determination to compensate for the subjunctive effects of further sampling, like controlling for the potential of new explanatory terms improving the model by chance: that is, the adjustment formula itself provides "shrinkage." But the adjustment formula yields an artificial shrinkage. A shrinkage estimator is an

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...

that, either explicitly or implicitly, incorporates the effects of shrinkage. In loose terms this means that a naive or raw estimate is improved by combining it with other information. The term relates to the notion that the improved estimate is made closer to the value supplied by the 'other information' than the raw estimate. In this sense, shrinkage is used to regularize ill-posed

inference Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word '' infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinction that ...

problems. Shrinkage is implicit in Bayesian inference and penalized likelihood inference, and explicit in James–Stein-type inference. In contrast, simple types of maximum-likelihood and least-squares estimation procedures do not include shrinkage effects, although they can be used within shrinkage estimation schemes.

Description

Many standard estimators can be improved, in terms of

mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwe ...

(MSE), by shrinking them towards zero (or any other fixed constant value). In other words, the improvement in the estimate from the corresponding reduction in the width of the confidence interval can outweigh the worsening of the estimate introduced by biasing the estimate towards zero (see bias-variance tradeoff). Assume that the expected value of the raw estimate is not zero and consider other estimators obtained by multiplying the raw estimate by a certain parameter. A value for this parameter can be specified so as to minimize the MSE of the new estimate. For this value of the parameter, the new estimate will have a smaller MSE than the raw one. Thus it has been improved. An effect here may be to convert an unbiased raw estimate to an improved biased one.

Examples

A well-known example arises in the estimation of the population

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...

sample variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...

. For a sample size of ''n'', the use of a divisor ''n'' − 1 in the usual formula (

Bessel's correction In statistics, Bessel's correction is the use of ''n'' − 1 instead of ''n'' in the formula for the sample variance and sample standard deviation, where ''n'' is the number of observations in a sample. This method corrects the bias i ...

) gives an unbiased estimator, while other divisors have lower MSE, at the expense of bias. The optimal choice of divisor (weighting of shrinkage) depends on the excess kurtosis of the population, as discussed at mean squared error: variance, but one can always do better (in terms of MSE) than the unbiased estimator; for the normal distribution a divisor of ''n'' + 1 gives one which has the minimum mean squared error.

Methods

Types of

regression Regression or regressions may refer to: Science * Marine regression, coastal advance due to falling sea level, the opposite of marine transgression * Regression (medicine), a characteristic of diseases to express lighter symptoms or less extent ( ...

that involve shrinkage estimates include

ridge regression Ridge regression is a method of estimating the coefficients of multiple- regression models in scenarios where the independent variables are highly correlated. It has been used in many fields including econometrics, chemistry, and engineering. Also ...

, where coefficients derived from a regular least squares regression are brought closer to zero by multiplying by a constant (the ''shrinkage factor''), and lasso regression, where coefficients are brought closer to zero by adding or subtracting a constant. The use of shrinkage estimators in the context of regression analysis, where there may be a large number of explanatory variables, has been described by Copas. Here the values of the estimated regression coefficients are shrunk towards zero with the effect of reducing the mean square error of predicted values from the model when applied to new data. A later paper by Copas applies shrinkage in a context where the problem is to predict a binary response on the basis of binary explanatory variables. Hausser and Strimmer "develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, ...it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, even in cases of severe undersampling. ...method is fully analytic and hence computationally inexpensive. Moreover, ...procedure simultaneously provides estimates of the entropy and of the cell frequencies. ...The proposed shrinkage estimators of entropy and mutual information, as well as all other investigated entropy estimators, have been implemented in R (R Development Core Team, 2008). A corresponding R package “entropy” was deposited in the R archive CRAN and is accessible at the URL https://cran.r-project.org/web/packages/entropy/ under the GNU General Public License."

Statistical software

References

Estimation theory Estimator {{stat-stub

Description

Examples

Methods

See also

Statistical software

References