Non-uniform random variate generation or pseudo-random number sampling is the
numerical practice of generating
pseudo-random numbers (PRN) that follow a given
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
.
Methods are typically based on the availability of a
uniformly distributed PRN generator. Computational algorithms are then used to manipulate a single
random variate
In probability and statistics, a random variate or simply variate is a particular outcome of a ''random variable'': the random variates which are other outcomes of the same random variable might have different values ( random numbers).
A random ...
, ''X'', or often several such variates, into a new random variate ''Y'' such that these values have the required distribution.
The first methods were developed for
Monte-Carlo simulations in the
Manhattan project
The Manhattan Project was a research and development undertaking during World War II that produced the first nuclear weapons. It was led by the United States with the support of the United Kingdom and Canada. From 1942 to 1946, the project w ...
, published by
John von Neumann
John von Neumann (; hu, Neumann János Lajos, ; December 28, 1903 – February 8, 1957) was a Hungarian-American mathematician, physicist, computer scientist, engineer and polymath. He was regarded as having perhaps the widest cove ...
in the early 1950s.
Finite discrete distributions
For a
discrete probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
with a finite number ''n'' of indices at which the
probability mass function
In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...
''f'' takes non-zero values, the basic sampling algorithm is straightforward. The interval
[0, 1
) is divided in ''n'' intervals [0, ''f''(1)), [''f''(1), ''f''(1) + ''f''(2)), ... The width of interval ''i'' equals the probability ''f''(''i'').
One draws a uniformly distributed pseudo-random number ''X'', and searches for the index ''i'' of the corresponding interval. The so determined ''i'' will have the distribution ''f''(''i'').
Formalizing this idea becomes easier by using the cumulative distribution function
:
It is convenient to set ''F''(0) = 0. The ''n'' intervals are then simply [''F''(0), ''F''(1)), [''F''(1), ''F''(2)), ..., [''F''(''n'' − 1), ''F''(''n'')). The main computational task is then to determine ''i'' for which ''F''(''i'' − 1) ≤ ''X'' < ''F''(''i'').
This can be done by different algorithms:
* Linear search, computational time linear in ''n''.
* Binary search, computational time goes with log ''n''.
* Indexed search, also called the ''cutpoint method''.
* Alias method, computational time is constant, using some pre-computed tables.
* There are other methods that cost constant time.
[Fishman (1996) ]
Continuous distributions
Generic methods for generating
independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independe ...
samples:
*
Rejection sampling
In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of ...
for arbitrary density functions
*
Inverse transform sampling for distributions whose
CDF is known
*
Ratio of uniforms, combining a change of variables and rejection sampling
*
Slice sampling
Slice sampling is a type of Markov chain Monte Carlo algorithm for pseudo-random number sampling, i.e. for drawing random samples from a statistical distribution. The method is based on the observation that to sample a random variable one can sampl ...
*
Ziggurat algorithm
The ziggurat algorithm is an algorithm for pseudo-random number sampling. Belonging to the class of rejection sampling algorithms, it relies on an underlying source of uniformly-distributed random numbers, typically from a pseudo-random number ...
, for monotonically decreasing density functions as well as symmetric unimodal distributions
*
Convolution random number generator, not a sampling method in itself: it describes the use of arithmetics on top of one or more existing sampling methods to generate more involved distributions.
Generic methods for generating
correlated
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statisti ...
samples (often necessary for unusually-shaped or high-dimensional distributions):
*
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
, the general principle
*
Metropolis–Hastings algorithm
In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. This seq ...
*
Gibbs sampling
In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution, when direct sampling is diff ...
*
Slice sampling
Slice sampling is a type of Markov chain Monte Carlo algorithm for pseudo-random number sampling, i.e. for drawing random samples from a statistical distribution. The method is based on the observation that to sample a random variable one can sampl ...
*
Reversible-jump Markov chain Monte Carlo, when the number of dimensions is not fixed (e.g. when estimating a
mixture model
In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observat ...
and simultaneously estimating the number of mixture components)
*
Particle filter
Particle filters, or sequential Monte Carlo methods, are a set of Monte Carlo algorithms used to solve filtering problems arising in signal processing and Bayesian statistical inference. The filtering problem consists of estimating the int ...
s, when the observed data is connected in a
Markov chain
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...
and should be processed sequentially
For generating a
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu i ...
:
*
Box–Muller transform
*
Marsaglia polar method
The Marsaglia polar method is a pseudo-random number sampling method for generating a pair of independent standard normal random variables.
Standard normal random variables are frequently used in computer science, computational statistics, and ...
For generating a
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known ...
:
* See
Poisson distribution#Generating Poisson-distributed random variables
Software libraries
GNU Scientific Library has a section entitled "Random Number Distributions" with routines for sampling under more than twenty different distributions.
See also
*
Beta distribution#Random variate generation
*
Dirichlet distribution#Random variate generation
*
Exponential distribution#Random variate generation
*
Gamma distribution#Random variate generation
*
Gumbel distribution#Random variate generation
*
Laplace distribution#Random variate generation
*
Multinomial distribution#Random variate distribution
*
Pareto distribution#Random variate generation
*
Poisson distribution#Random variate generation
Footnotes
Literature
* Devroye, L. (1986)
Non-Uniform Random Variate Generation'' New York: Springer
* Fishman, G.S. (1996)
Monte Carlo. Concepts, Algorithms, and Applications'' New York: Springer
* Hörmann, W.; J Leydold, G Derflinger (2004,2011)
Automatic Nonuniform Random Variate Generation'' Berlin: Springer.
*
Knuth, D.E. (1997) ''
The Art of Computer Programming
''The Art of Computer Programming'' (''TAOCP'') is a comprehensive monograph written by the computer scientist Donald Knuth presenting programming algorithms and their analysis. Volumes 1–5 are intended to represent the central core of comp ...
'', Vol. 2 ''Seminumerical Algorithms'', Chapter 3.4.1 (3rd edition).
* Ripley, B.D. (1987) ''Stochastic Simulation''. Wiley.
{{DEFAULTSORT:Pseudo-Random Number Sampling
Pseudorandom number generators
Non-uniform random numbers