Bootstrapping is any test or metric that uses
random sampling with replacement (e.g. mimicking the sampling process), and falls under the broader class of
resampling methods. Bootstrapping assigns measures of accuracy (bias, variance,
confidence interval
In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as ...
s, prediction error, etc.) to sample estimates.
[software]
This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.
[
Bootstrapping estimates the properties of an ]estimand An estimand is a quantity that is to be estimated in a statistical analysis. The term is used to more clearly distinguish the target of inference from the method used to obtain an approximation of this target (i.e., the estimator) and the specific ...
(such as its variance
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
) by measuring those properties when sampling from an approximating distribution. One standard choice for an approximating distribution is the empirical distribution function
In statistics, an empirical distribution function (commonly also called an empirical Cumulative Distribution Function, eCDF) is the distribution function associated with the empirical measure of a sample. This cumulative distribution function ...
of the observed data. In the case where a set of observations can be assumed to be from an independent and identically distributed
In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usua ...
population, this can be implemented by constructing a number of resamples with replacement, of the observed data set (and of equal size to the observed data set).
It may also be used for constructing hypothesis tests. It is often used as an alternative to statistical inference based on the assumption of a parametric model when that assumption is in doubt, or where parametric inference is impossible or requires complicated formulas for the calculation of standard error
The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...
s.
History
The bootstrap was published by Bradley Efron
Bradley Efron (; born May 24, 1938) is an American statistician. Efron has been president of the American Statistical Association (2004) and of the Institute of Mathematical Statistics (1987–1988).Cochran, J. (1 September 2015), "ASA Lea ...
in "Bootstrap methods: another look at the jackknife" (1979), inspired by earlier work on the jackknife.[Quenouille M (1949) Approximate tests of correlation in time-series. J Roy Statist Soc Ser B 11 68–84][Tukey J (1958) Bias and confidence in not-quite large samples (abstract). Ann Math Statist 29 614][Jaeckel L (1972) The infinitesimal jackknife. Memorandum MM72-1215-11, Bell Lab] Improved estimates of the variance were developed later.[Bickel P, Freeman D (1981) Some asymptotic theory for the bootstrap. Ann Statist 9 1196–1217][Singh K (1981]
On the asymptotic accuracy of Efron’s bootstrap
Ann Statist 9 1187–1195 A Bayesian extension was developed in 1981.[Rubin D (1981). The Bayesian bootstrap. Ann Statist 9 130–134] The bias-corrected and accelerated (BCa) bootstrap was developed by Efron in 1987,[ and the ABC procedure in 1992.][Diciccio T, Efron B (1992]
More accurate confidence intervals in exponential families
Biometrika 79 231–245
Approach
The basic idea of bootstrapping is that inference about a population from sample data (sample → population) can be modeled by ''resampling'' the sample data and performing inference about a sample from resampled data (resampled → sample). As the population is unknown, the true error in a sample statistic against its population value is unknown. In bootstrap-resamples, the 'population' is in fact the sample, and this is known; hence the quality of inference of the 'true' sample from resampled data (resampled → sample) is measurable.
More formally, the bootstrap works by treating inference of the true probability distribution ''J'', given the original data, as being analogous to an inference of the empirical distribution ''Ĵ'', given the resampled data. The accuracy of inferences regarding ''Ĵ'' using the resampled data can be assessed because we know ''Ĵ''. If ''Ĵ'' is a reasonable approximation to ''J'', then the quality of inference on ''J'' can in turn be inferred.
As an example, assume we are interested in the average (or mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set.
For a data set, the '' ari ...
) height of people worldwide. We cannot measure all the people in the global population, so instead, we sample only a tiny part of it, and measure that. Assume the sample is of size ''N''; that is, we measure the heights of ''N'' individuals. From that single sample, only one estimate of the mean can be obtained. In order to reason about the population, we need some sense of the variability of the mean that we have computed. The simplest bootstrap method involves taking the original data set of heights, and, using a computer, sampling from it to form a new sample (called a 'resample' or bootstrap sample) that is also of size ''N''. The bootstrap sample is taken from the original by using sampling with replacement (e.g. we might 'resample' 5 times from ,2,3,4,5
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
and get ,5,4,4,1, so, assuming ''N'' is sufficiently large, for all practical purposes there is virtually zero probability that it will be identical to the original "real" sample. This process is repeated a large number of times (typically 1,000 or 10,000 times), and for each of these bootstrap samples, we compute its mean (each of these is called a "bootstrap estimate"). We now can create a histogram of bootstrap means. This histogram provides an estimate of the shape of the distribution of the sample mean from which we can answer questions about how much the mean varies across samples. (The method here, described for the mean, can be applied to almost any other statistic
A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hy ...
or estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
.)
Discussion
Advantages
A great advantage of bootstrap is its simplicity. It is a straightforward way to derive estimates of standard errors and confidence intervals
In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as ...
for complex estimators of the distribution, such as percentile points, proportions, odds ratio, and correlation coefficients. However, despite its simplicity, bootstrapping can be applied to complex sampling designs (e.g. for population divided into s strata with ns observations per strata, bootstrapping can be applied for each strata). Bootstrap is also an appropriate way to control and check the stability of the results. Although for most problems it is impossible to know the true confidence interval, bootstrap is asymptotically more accurate than the standard intervals obtained using sample variance and assumptions of normality.[DiCiccio TJ, Efron B (1996) Bootstrap confidence intervals (with
Discussion). Statistical Science 11: 189–228] Bootstrapping is also a convenient method that avoids the cost of repeating the experiment to get other groups of sample data.
Disadvantages
Bootstrapping depends heavily on the estimator used and, though simple, ignorant use of bootstrapping will not always yield asymptotically valid results and can lead to inconsistency. Although bootstrapping is (under some conditions) asymptotically consistent
In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consisten ...
, it does not provide general finite-sample guarantees. The result may depend on the representative sample. The apparent simplicity may conceal the fact that important assumptions are being made when undertaking the bootstrap analysis (e.g. independence of samples or large enough of a sample size) where these would be more formally stated in other approaches. Also, bootstrapping can be time-consuming and there are not many available software for bootstrapping as it is difficult to automate using traditional statistical computer packages.
Recommendations
Scholars have recommended more bootstrap samples as available computing power has increased. If the results may have substantial real-world consequences, then one should use as many samples as is reasonable, given available computing power and time. Increasing the number of samples cannot increase the amount of information in the original data; it can only reduce the effects of random sampling errors which can arise from a bootstrap procedure itself. Moreover, there is evidence that numbers of samples greater than 100 lead to negligible improvements in the estimation of standard errors. In fact, according to the original developer of the bootstrapping method, even setting the number of samples at 50 is likely to lead to fairly good standard error estimates.
Adèr et al. recommend the bootstrap procedure for the following situations:[
:*When the theoretical distribution of a statistic of interest is complicated or unknown. Since the bootstrapping procedure is distribution-independent it provides an indirect method to assess the properties of the distribution underlying the sample and the parameters of interest that are derived from this distribution.
:*When the ]sample size
Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a populatio ...
is insufficient for straightforward statistical inference. If the underlying distribution is well-known, bootstrapping provides a way to account for the distortions caused by the specific sample that may not be fully representative of the population.
:* When power calculations have to be performed, and a small pilot sample is available. Most power and sample size calculations are heavily dependent on the standard deviation of the statistic of interest. If the estimate used is incorrect, the required sample size will also be wrong. One method to get an impression of the variation of the statistic is to use a small pilot sample and perform bootstrapping on it to get impression of the variance.
However, Athreya has shown that if one performs a naive bootstrap on the sample mean when the underlying population lacks a finite variance (for example, a power law distribution), then the bootstrap distribution will not converge to the same limit as the sample mean. As a result, confidence intervals on the basis of a Monte Carlo simulation
Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determ ...
of the bootstrap could be misleading. Athreya states that "Unless one is reasonably sure that the underlying distribution is not heavy tailed, one should hesitate to use the naive bootstrap".
Types of bootstrap scheme
In univariate problems, it is usually acceptable to resample the individual observations with replacement ("case resampling" below) unlike subsampling, in which resampling is without replacement and is valid under much weaker conditions compared to the bootstrap. In small samples, a parametric bootstrap approach might be preferred. For other problems, a ''smooth bootstrap'' will likely be preferred.
For regression problems, various other alternatives are available.
Case resampling
The bootstrap is generally useful for estimating the distribution of a statistic (e.g. mean, variance) without using normality assumptions (as required, e.g., for a z-statistic or a t-statistic). In particular, the bootstrap is useful when there is no analytical form or an asymptotic theory (e.g., an applicable central limit theorem
In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables thems ...
) to help estimate the distribution of the statistics of interest. This is because bootstrap methods can apply to most random quantities, e.g., the ratio of variance and mean. There are at least two ways of performing case resampling.
# The Monte Carlo algorithm for case resampling is quite simple. First, we resample the data with replacement, and the size of the resample must be equal to the size of the original data set. Then the statistic of interest is computed from the resample from the first step. We repeat this routine many times to get a more precise estimate of the Bootstrap distribution of the statistic.
# The 'exact' version for case resampling is similar, but we exhaustively enumerate every possible resample of the data set. This can be computationally expensive as there are a total of different resamples, where ''n'' is the size of the data set. Thus for ''n'' = 5, 10, 20, 30 there are 126, 92378, 6.89 × 1010 and 5.91 × 1016 different resamples respectively.
Estimating the distribution of sample mean
Consider a coin-flipping experiment. We flip the coin and record whether it lands heads or tails. Let be 10 observations from the experiment. if the i th flip lands heads, and 0 otherwise. By invoking the assumption that the average of the coin flips is normally distributed, we can use the t-statistic
In statistics, the ''t''-statistic is the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error. It is used in hypothesis testing via Student's ''t''-test. The ''t''-statistic is used i ...
to estimate the distribution of the sample mean,
:
Such a normality assumption can be justified either as an approximation of the distribution of each ''individual'' coin flip or as an approximation of the distribution of the ''average'' of a large number of coin flips. The former is a poor approximation because the true distribution of the coin flips is Bernoulli instead of normal. The latter is a valid approximation in ''infinitely large'' samples due to the central limit theorem
In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables thems ...
.
However, if we are not ready to make such a justification, then we can use the bootstrap instead. Using case resampling, we can derive the distribution of . We first resample the data to obtain a ''bootstrap resample''. An example of the first resample might look like this . There are some duplicates since a bootstrap resample comes from sampling with replacement from the data. Also the number of data points in a bootstrap resample is equal to the number of data points in our original observations. Then we compute the mean of this resample and obtain the first ''bootstrap mean'': ''μ''1*. We repeat this process to obtain the second resample ''X''2* and compute the second bootstrap mean ''μ''2*. If we repeat this 100 times, then we have ''μ''1*, ''μ''2*, ..., ''μ''100*. This represents an ''empirical bootstrap distribution'' of sample mean. From this empirical distribution, one can derive a ''bootstrap confidence interval'' for the purpose of hypothesis testing.
Regression
In regression problems, ''case resampling'' refers to the simple scheme of resampling individual cases – often rows of a data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the d ...
. For regression problems, as long as the data set is fairly large, this simple scheme is often acceptable. However, the method is open to criticism.
In regression problems, the explanatory variable
Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
s are often fixed, or at least observed with more control than the response variable. Also, the range of the explanatory variables defines the information available from them. Therefore, to resample cases means that each bootstrap sample will lose some information. As such, alternative bootstrap procedures should be considered.
Bayesian bootstrap
Bootstrapping can be interpreted in a Bayesian framework using a scheme that creates new data sets through reweighting the initial data. Given a set of data points, the weighting assigned to data point in a new data set is , where is a low-to-high ordered list of uniformly distributed random numbers on