In numerical analysis and
computational statistics
Computational statistics, or statistical computing, is the bond between statistics and computer science. It means statistical methods that are enabled by using computational methods. It is the area of computational science (or scientific comput ...
, rejection sampling is a basic technique used to generate observations from a
distribution Distribution may refer to:
Mathematics
*Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations
*Probability distribution, the probability of a particular value or value range of a varia ...
. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of exact simulation method. The method works for any distribution in
with a
density
Density (volumetric mass density or specific mass) is the substance's mass per unit of volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' can also be used. Mathematicall ...
.
Rejection sampling is based on the observation that to sample a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
in one dimension, one can perform a uniformly random sampling of the two-dimensional Cartesian graph, and keep the samples in the region under the graph of its density function.
Note that this property can be extended to ''N''-dimension functions.
Description
To visualize the motivation behind rejection sampling, imagine graphing the density function of a random variable onto a large rectangular board and throwing darts at it. Assume that the darts are uniformly distributed around the board. Now remove all of the darts that are outside the area under the curve. The remaining darts will be distributed uniformly within the area under the curve, and the x-positions of these darts will be distributed according to the random variable's density. This is because there is the most room for the darts to land where the curve is highest and thus the probability density is greatest.
The visualization as just described is equivalent to a particular form of rejection sampling where the "proposal distribution" is uniform (hence its graph is a rectangle). The general form of rejection sampling assumes that the board is not necessarily rectangular but is shaped according to the density of some proposal distribution that we know how to sample from (for example, using
inversion sampling
Inverse transform sampling (also known as inversion sampling, the inverse probability integral transform, the inverse transformation method, Smirnov transform, or the golden ruleAalto University, N. Hyvönen, Computational methods in inverse proble ...
), and which is at least as high at every point as the distribution we want to sample from, so that the former completely encloses the latter. (Otherwise, there would be parts of the curved area we want to sample from that could never be reached.)
Rejection sampling works as follows:
#Sample a point on the x-axis from the proposal distribution.
#Draw a vertical line at this x-position, up to the maximum y-value of the probability density function of the proposal distribution.
#Sample uniformly along this line from 0 to the maximum of the probability density function. If the sampled value is greater than the value of the desired distribution at this vertical line, reject the x-value and return to step 1; else the x-value is a sample from the desired distribution.
This algorithm can be used to sample from the area under any curve, regardless of whether the function integrates to 1. In fact, scaling a function by a constant has no effect on the sampled x-positions. Thus, the algorithm can be used to sample from a distribution whose
normalizing constant
The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics. The normalizing constant is used to reduce any probability function to a probability density function with total probability of one.
...
is unknown, which is common in
computational statistics
Computational statistics, or statistical computing, is the bond between statistics and computer science. It means statistical methods that are enabled by using computational methods. It is the area of computational science (or scientific comput ...
.
Theory
The rejection sampling method generates sampling values from a target distribution
with arbitrary
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...
by using a proposal distribution
with probability density
. The idea is that one can generate a sample value from
by instead sampling from
and accepting the sample from
with probability
, repeating the draws from
until a value is accepted.
here is a constant, finite bound on the likelihood ratio
, satisfying
over the
support of
; in other words, M must satisfy
for all values of
. Note that this requires that the support of
must include the support of
—in other words,
whenever
.
The validation of this method is the envelope principle: when simulating the pair
, one produces a uniform simulation over the subgraph of
. Accepting only pairs such that