In
statistics, the jackknife (jackknife cross-validation) is a
cross-validation technique and, therefore, a form of
resampling.
It is especially useful for
bias
Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individual, a group ...
and
variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
estimation. The jackknife pre-dates other common resampling methods such as the
bootstrap. Given a sample of size
, a jackknife
estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
can be built by aggregating the parameter estimates from each subsample of size
obtained by omitting one observation.
The jackknife technique was developed by
Maurice Quenouille
Prof Maurice Henri Quenouille FRSE FRSS (1924 – 12 December 1973) was a 20th-century British statistician remembered as the creator of Jackknife resampling.
Biography
The unusual surname is French in origin, meaning " distaff". The surname ha ...
(1924–1973) from 1949 and refined in 1956.
John Tukey
John Wilder Tukey (; June 16, 1915 – July 26, 2000) was an American mathematician and statistician, best known for the development of the fast Fourier Transform (FFT) algorithm and box plot. The Tukey range test, the Tukey lambda distributi ...
expanded on the technique in 1958 and proposed the name "jackknife" because, like a physical
jack-knife (a compact folding knife), it is a
rough-and-ready tool that can improvise a solution for a variety of problems even though specific problems may be more efficiently solved with a purpose-designed tool.
The jackknife is a linear approximation of the
bootstrap.
A simple example: Mean estimation
The jackknife
estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
of a parameter is found by systematically leaving out each observation from a dataset and calculating the parameter estimate over the remaining observations and then aggregating these calculations.
For example, if the parameter to be estimated is the population mean of random variable ''
'', then for a given set of i.i.d. observations
the natural estimator is the sample mean:
:
where the last sum used another way to indicate that the index
runs over the set
.
Then we proceed as follows: For each