In statistics, resampling is the creation of new samples based on one observed sample. Resampling methods are: #

Permutation test A permutation test (also called re-randomization test) is an exact statistical hypothesis test making use of the proof by contradiction. A permutation test involves two or more samples. The null hypothesis is that all samples come from the same di ...

s (also re-randomization tests) #

Bootstrapping In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input. Etymology Tall boots may have a tab, loop or handle at the top known as a bootstrap, allowing one to use fingers ...

# Cross validation

Permutation tests

Permutation tests rely on resampling the original data assuming the null hypothesis. Based on the resampled data it can be concluded how likely the original data is to occur under the null hypothesis.

Bootstrap

Bootstrapping is a statistical method for estimating the

sampling distribution In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given random-sample-based statistic. If an arbitrarily large number of samples, each involving multiple observations (data points), were sep ...

of an

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...

by sampling with replacement from the original sample, most often with the purpose of deriving robust estimates of

standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error o ...

s and

confidence intervals In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...

of a population parameter like a

mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the '' ari ...

, median, proportion,

odds ratio An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently (due ...

correlation coefficient A correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables. The variables may be two columns of a given data set of observations, often called a sample, or two components ...

or regression coefficient. It has been called the plug-in principle,Logan, J. David and Wolesensky, Willian R. Mathematical methods in biology. Pure and Applied Mathematics: a Wiley-interscience Series of Texts, Monographs, and Tracts. John Wiley& Sons, Inc. 2009. Chapter 6: Statistical inference. Section 6.6: Bootstrap methods as it is the method of

estimation Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...

of functionals of a population distribution by evaluating the same functionals at the empirical distribution based on a sample. For example, when estimating the

population Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using a ...

, this method uses the sample mean; to estimate the population median, it uses the sample median; to estimate the population

regression line In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is c ...

, it uses the sample regression line. It may also be used for constructing hypothesis tests. It is often used as a robust alternative to inference based on parametric assumptions when those assumptions are in doubt, or where parametric inference is impossible or requires very complicated formulas for the calculation of standard errors. Bootstrapping techniques are also used in the updating-selection transitions of

particle filter Particle filters, or sequential Monte Carlo methods, are a set of Monte Carlo algorithms used to solve filtering problems arising in signal processing and Bayesian statistical inference. The filtering problem consists of estimating the i ...

s, genetic type algorithms and related resample/reconfiguration

Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determi ...

s used in

computational physics Computational physics is the study and implementation of numerical analysis to solve problems in physics for which a quantitative theory already exists. Historically, computational physics was the first application of modern computers in science, ...

. In this context, the bootstrap is used to replace sequentially empirical weighted probability measures by

empirical measure In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical sta ...

s. The bootstrap allows to replace the samples with low weights by copies of the samples with high weights.

Cross-validation

Cross-validation is a statistical method for validating a

predictive model Predictive modelling uses statistics to predict outcomes. Most often the event one wants to predict is in the future, but predictive modelling can be applied to any type of unknown event, regardless of when it occurred. For example, predictive mod ...

. Subsets of the data are held out for use as validating sets; a model is fit to the remaining data (a training set) and used to predict for the validation set. Averaging the quality of the predictions across the validation sets yields an overall measure of prediction accuracy. Cross-validation is employed repeatedly in building decision trees. One form of cross-validation leaves out a single observation at a time; this is similar to the jackknife. Another, ''K''-fold cross-validation, splits the data into ''K'' subsets; each is held out in turn as the validation set. This avoids "self-influence". For comparison, in

regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...

methods such as linear regression, each ''y'' value draws the regression line toward itself, making the prediction of that value appear more accurate than it really is. Cross-validation applied to linear regression predicts the ''y'' value for each observation without using that observation. This is often used for deciding how many predictor variables to use in regression. Without cross-validation, adding predictors always reduces the residual sum of squares (or possibly leaves it unchanged). In contrast, the cross-validated mean-square error will tend to decrease if valuable predictors are added, but increase if worthless predictors are added.

Monte Carlo cross-validation

Subsampling is an alternative method for approximating the sampling distribution of an estimator. The two key differences to the bootstrap are: # the resample size is smaller than the sample size and # resampling is done without replacement. The advantage of subsampling is that it is valid under much weaker conditions compared to the bootstrap. In particular, a set of sufficient conditions is that the rate of convergence of the estimator is known and that the limiting distribution is continuous. In addition, the resample (or subsample) size must tend to infinity together with the sample size but at a smaller rate, so that their ratio converges to zero. While subsampling was originally proposed for the case of independent and identically distributed (iid) data only, the methodology has been extended to cover time series data as well; in this case, one resamples blocks of subsequent data rather than individual data points. There are many cases of applied interest where subsampling leads to valid inference whereas bootstrapping does not; for example, such cases include examples where the rate of convergence of the estimator is not the square root of the sample size or when the limiting distribution is non-normal. When both subsampling and the bootstrap are consistent, the bootstrap is typically more accurate.

RANSAC Random sample consensus (RANSAC) is an iterative method to estimate parameters of a mathematical model from a set of observed data that contains outliers, when outliers are to be accorded no influence on the values of the estimates. Therefore, it a ...

is a popular algorithm using subsampling.

Jackknife cross-validation

Jackknifing (jackknife cross-validation), is used in statistical inference to estimate the bias and standard error (variance) of a statistic, when a random sample of observations is used to calculate it. Historically, this method preceded the invention of the bootstrap with Quenouille inventing this method in 1949 and Tukey extending it in 1958. This method was foreshadowed by Mahalanobis who in 1946 suggested repeated estimates of the statistic of interest with half the sample chosen at random. He coined the name 'interpenetrating samples' for this method. Quenouille invented this method with the intention of reducing the bias of the sample estimate. Tukey extended this method by assuming that if the replicates could be considered identically and independently distributed, then an estimate of the variance of the sample parameter could be made and that it would be approximately distributed as a t variate with ''n''−1 degrees of freedom (''n'' being the sample size). The basic idea behind the jackknife variance estimator lies in systematically recomputing the statistic estimate, leaving out one or more observations at a time from the sample set. From this new set of replicates of the statistic, an estimate for the bias and an estimate for the variance of the statistic can be calculated. Instead of using the jackknife to estimate the variance, it may instead be applied to the log of the variance. This transformation may result in better estimates particularly when the distribution of the variance itself may be non normal. For many statistical parameters the jackknife estimate of variance tends asymptotically to the true value almost surely. In technical terms one says that the jackknife estimate is

consistent In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent ...

. The jackknife is consistent for the sample

s, sample

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...

s, central and non-central t-statistics (with possibly non-normal populations), sample coefficient of variation,

maximum likelihood estimator In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statis ...

s, least squares estimators, correlation coefficients and

regression coefficient In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is cal ...

s. It is not consistent for the sample median. In the case of a unimodal variate the ratio of the jackknife variance to the sample variance tends to be distributed as one half the square of a chi square distribution with two degrees of freedom. The jackknife, like the original bootstrap, is dependent on the independence of the data. Extensions of the jackknife to allow for dependence in the data have been proposed. Another extension is the delete-a-group method used in association with

Poisson sampling In survey methodology, Poisson sampling (sometimes denoted as ''PO sampling'') is a sampling process where each element of the population is subjected to an independent Bernoulli trial which determines whether the element becomes part of the sampl ...

. Jackknife is equivalent to the random (subsampling) leave-one-out cross-validation, it only differs in the goal.

Comparison of bootstrap and jackknife

Both methods, the bootstrap and the jackknife, estimate the variability of a statistic from the variability of that statistic between subsamples, rather than from parametric assumptions. For the more general jackknife, the delete-m observations jackknife, the bootstrap can be seen as a random approximation of it. Both yield similar numerical results, which is why each can be seen as approximation to the other. Although there are huge theoretical differences in their mathematical insights, the main practical difference for statistics users is that the bootstrap gives different results when repeated on the same data, whereas the jackknife gives exactly the same result each time. Because of this, the jackknife is popular when the estimates need to be verified several times before publishing (e.g., official statistics agencies). On the other hand, when this verification feature is not crucial and it is of interest not to have a number but just an idea of its distribution, the bootstrap is preferred (e.g., studies in physics, economics, biological sciences). Whether to use the bootstrap or the jackknife may depend more on operational aspects than on statistical concerns of a survey. The jackknife, originally used for bias reduction, is more of a specialized method and only estimates the variance of the point estimator. This can be enough for basic statistical inference (e.g., hypothesis testing, confidence intervals). The bootstrap, on the other hand, first estimates the whole distribution (of the point estimator) and then computes the variance from that. While powerful and easy, this can become highly computationally intensive. "The bootstrap can be applied to both variance and distribution estimation problems. However, the bootstrap variance estimator is not as good as the jackknife or the

balanced repeated replication Balanced repeated replication is a statistical technique for estimating the sampling variability of a statistic obtained by stratified sampling. Outline of the technique # ''Select balanced half-samples'' from the full sample. # ''Calculate t ...

(BRR) variance estimator in terms of the empirical results. Furthermore, the bootstrap variance estimator usually requires more computations than the jackknife or the BRR. Thus, the bootstrap is mainly recommended for distribution estimation." There is a special consideration with the jackknife, particularly with the delete-1 observation jackknife. It should only be used with smooth, differentiable statistics (e.g., totals, means, proportions, ratios, odd ratios, regression coefficients, etc.; not with medians or quantiles). This could become a practical disadvantage. This disadvantage is usually the argument favoring bootstrapping over jackknifing. More general jackknifes than the delete-1, such as the delete-m jackknife or the delete-all-but-2 Hodges–Lehmann estimator, overcome this problem for the medians and quantiles by relaxing the smoothness requirements for consistent variance estimation. Usually the jackknife is easier to apply to complex sampling schemes than the bootstrap. Complex sampling schemes may involve stratification, multiple stages (clustering), varying sampling weights (non-response adjustments, calibration, post-stratification) and under unequal-probability sampling designs. Theoretical aspects of both the bootstrap and the jackknife can be found in Shao and Tu (1995), whereas a basic introduction is accounted in Wolter (2007). The bootstrap estimate of model prediction bias is more precise than jackknife estimates with linear models such as linear discriminant function or multiple regression.

References

Bibliography

* Good, P. (2006) ''Resampling Methods''. 3rd Ed. Birkhauser. * Wolter, K.M. (2007). ''Introduction to Variance Estimation''. 2nd Edition. Springer, Inc. * Pierre Del Moral (2004). Feynman-Kac formulae. Genealogical and Interacting particle systems with applications, Springer, Series Probability and Applications. * Pierre Del Moral (2013). Del Moral, Pierre (2013). ''Mean field simulation for Monte Carlo integration''. Chapman & Hall/CRC Press, Monographs on Statistics and Applied Probability.

External links

Software

*
Angelo Canty and Brian Ripley (2010). boot: Bootstrap R (S-Plus) Functions. R package version 1.2-43.
Functions and datasets for bootstrapping from the book ''Bootstrap Methods and Their Applications'' by A. C. Davison and D. V. Hinkley (1997, CUP).
Statistics101: Resampling, Bootstrap, Monte Carlo Simulation program

R package `samplingVarEst': Sampling Variance Estimation. Implements functions for estimating the sampling variance of some point estimators.

* ttps://github.com/searchivarius/PermTest Randomization/permutation tests to evaluate outcomes in information retrieval experiments (with and without adjustments for multiple comparisons).
Bioconductor resampling-based multiple hypothesis testing with Applications to Genomics.

* ttp://rosetta.ahmedmoustafa.io/bootstrap/ Bootstrap Resampling: interactive demonstration of hypothesis testing with bootstrap resampling in R.
Permutation Test: interactive demonstration of hypothesis testing with permutation test in R.
{{DEFAULTSORT:Resampling (Statistics) Monte Carlo methods Statistical inference Nonparametric statistics