In statistics, the jackknife (jackknife cross-validation) is a cross-validation technique and, therefore, a form of resampling. It is especially useful for

bias Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group ...

and

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...

estimation. The jackknife pre-dates other common resampling methods such as the bootstrap. Given a sample of size

n

, a jackknife

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...

can be built by aggregating the parameter estimates from each subsample of size

(n-1)

obtained by omitting one observation. The jackknife technique was developed by Maurice Quenouille (1924–1973) from 1949 and refined in 1956. John Tukey expanded on the technique in 1958 and proposed the name "jackknife" because, like a physical jack-knife (a compact folding knife), it is a rough-and-ready tool that can improvise a solution for a variety of problems even though specific problems may be more efficiently solved with a purpose-designed tool. The jackknife is a linear approximation of the bootstrap.

A simple example: Mean estimation

The jackknife

of a parameter is found by systematically leaving out each observation from a dataset and calculating the parameter estimate over the remaining observations and then aggregating these calculations. For example, if the parameter to be estimated is the population mean of random variable ''

x

'', then for a given set of i.i.d. observations

x_1, ..., x_n

the natural estimator is the sample mean: :

\bar =\frac \sum_^ x_i =\frac \sum_ x_i,

where the last sum used another way to indicate that the index

i

runs over the set

= \

. Then we proceed as follows: For each

i \in /math> we compute the mean \bar_of the jackknife subsample consisting of all but the '' i''-th data point, and this is called the i -th jackknife replicate:

: \bar_ =\frac \sum_ x_j, \quad \quad i=1, \dots ,n. It could help to think that these '' n'' jackknife replicates \bar_,\ldots,\bar_give us an approximation of the distribution of the sample mean \bar and the larger the n the better this approximation will be. Then finally to get the jackknife estimator we take the average of these n jackknife replicates:

: \bar_ = \frac\sum_^n \bar_. One may ask about the bias and the variance of \bar_. From the definition of \bar_as the average of the jackknife replicates one could try to calculate explicitly, and the bias is a trivial calculation but the variance of \bar_is more involved since the jackknife replicates are not independent. 

For the special case of the mean, one can show explicitly that the jackknife estimate equals the usual estimate:

: \frac\sum_^n \bar_ = \bar. This establishes the identity \bar_ = \bar . Then taking expectations we get E bar_= E bar =E /math>, so \bar_is unbiased, while taking variance we get V bar_= V bar =V n . However, these properties do not hold in general for other parameters than the mean.

This simple example for the case of mean estimation is just to illustrate the construction of a jackknife estimator, while the real subtleties (and the usefulness) emerge for the case of estimating other parameters, such as higher moments than the mean or other functionals of the distribution.

Note that \bar_could be used to construct an empirical estimate of the bias of \bar, namely \widehat(\bar)_ = c(\bar_ - \bar) with some suitable factor c>0, although in this case we know that \bar_ = \bar so this construction does not add any meaningful knowledge, but it is reassuring to note that it gives the correct estimation of the bias (which is zero).

A jackknife estimate of the variance of \bar can be calculated from the variance of the jackknife replicates \bar_: : p. 3. : \widehat(\bar)_
=\frac \sum_^n (\bar_ - \bar_)^2 
=\frac \sum_^n (x_i - \bar)^2. The left equality defines the estimator \widehat(\bar)_and the right equality is an identity that can be verified directly. Then taking expectations we get E widehat(\bar)_= V n = V bar /math>, so this is an unbiased estimator of the variance of \bar .

Estimating the bias of an estimator

The jackknife technique can be used to estimate (and correct) the bias of an estimator calculated over the entire sample. Suppose

\theta

is the target parameter of interest, which is assumed to be some functional of the distribution of

x

. Based on a finite set of observations

x_1, ..., x_n

, which is assumed to consist of

i.i.d. In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...

copies of

x

, the estimator

\hat

is constructed: :

\hat =f_n(x_1,\ldots,x_n).

The value of

\hat

is sample-dependent, so this value will change from one random sample to another. By definition, the bias of

\hat

is as follows: :

\text(\hat) = E

hat A hat is a head covering which is worn for various reasons, including protection against weather conditions, ceremonial reasons such as university graduation, religious reasons, safety, or as a fashion accessory. Hats which incorporate mecha ...

- \theta. One may wish to compute several values of

\hat

from several samples, and average them, to calculate an empirical approximation of

E

/math>, but this is impossible when there are no "other samples" when the entire set of available observations

x_1, ..., x_n

was used to calculate

\hat

. In this kind of situation the jackknife resampling technique may be of help. We construct the jackknife replicates: :

\hat_ =f_(x_,x_\ldots,x_)

\hat_ =f_(x_,x_,\ldots,x_)

\vdots

\hat_ =f_(x_1,x_,\ldots,x_)

where each replicate is a "leave-one-out" estimate based on the jackknife subsample consisting of all but one of the data points: :

\hat_ =f_(x_,\ldots,x_,x_,\ldots,x_) \quad \quad i=1, \dots,n.

Then we define their average: :

\hat_\mathrm=\frac \sum_^n \hat_

The jackknife estimate of the bias of

\hat

is given by: :

\widehat(\hat)_\mathrm =(n-1)(\hat_\mathrm - \hat)

and the resulting bias-corrected jackknife estimate of

\theta

is given by: :

\hat_^ 
=\hat - \widehat(\hat)_\mathrm
=n\hat - (n-1)\hat_\mathrm.

This removes the bias in the special case that the bias is

O(n^)

and reduces it to

O(n^)

in other cases.

Estimating the variance of an estimator

The jackknife technique can be also used to estimate the variance of an estimator calculated over the entire sample.

Literature

* * * * * * * * * * * Shao, J. and Tu, D. (1995). The Jackknife and Bootstrap. Springer-Verlag, Inc. * *

Notes

References