The ziggurat algorithm is an

algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...

for

pseudo-random number sampling Non-uniform random variate generation or pseudo-random number sampling is the numerical practice of generating pseudo-random numbers (PRN) that follow a given probability distribution. Methods are typically based on the availability of a unifo ...

. Belonging to the class of

rejection sampling In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type o ...

algorithms, it relies on an underlying source of uniformly-distributed random numbers, typically from a pseudo-random number generator, as well as precomputed tables. The algorithm is used to generate values from a monotonically decreasing

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

. It can also be applied to

symmetric Symmetry () in everyday life refers to a sense of harmonious and beautiful proportion and balance. In mathematics, the term has a more precise definition and is usually used to refer to an object that is invariant under some transformations ...

unimodal distribution In mathematics, unimodality means possessing a unique mode (statistics), mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object. Unimodal probability distribution In statis ...

s, such as the

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

, by choosing a value from one half of the distribution and then randomly choosing which half the value is considered to have been drawn from. It was developed by George Marsaglia and others in the 1960s. A typical value produced by the algorithm only requires the generation of one random floating-point value and one random table index, followed by one table lookup, one multiply operation and one comparison. Sometimes (2.5% of the time, in the case of a normal or

exponential distribution In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuousl ...

when using typical table sizes) more computations are required. Nevertheless, the algorithm is computationally much faster than the two most commonly used methods of generating normally distributed random numbers, the Marsaglia polar method and the Box–Muller transform, which require at least one logarithm and one square root calculation for each pair of generated values. However, since the ziggurat algorithm is more complex to implement it is best used when large quantities of random numbers are required. The term ''ziggurat algorithm'' dates from Marsaglia's paper with Wai Wan Tsang in 2000; it is so named because it is conceptually based on covering the probability distribution with rectangular segments stacked in decreasing order of size, resulting in a figure that resembles a

ziggurat A ziggurat (; Cuneiform: 𒅆𒂍𒉪, Akkadian: ', D-stem of ' 'to protrude, to build high', cognate with other Semitic languages like Hebrew ''zaqar'' (זָקַר) 'protrude'), ( Persian: Chogha Zanbilچغازنجبیل) is a type of massive ...

Theory of operation

The ziggurat algorithm is a rejection sampling algorithm; it randomly generates a point in a distribution slightly larger than the desired distribution, then tests whether the generated point is inside the desired distribution. If not, it tries again. Given a random point underneath a probability density curve, its ''x'' coordinate is a random number with the desired distribution. The distribution the ziggurat algorithm chooses from is made up of ''n'' equal-area regions; ''n'' − 1 rectangles that cover the bulk of the desired distribution, on top of a non-rectangular base that includes the tail of the distribution. Given a monotone decreasing probability density function ''f''(''x''), defined for all ''x'' ≥ 0, the base of the ziggurat is defined as all points inside the distribution and below ''y''₁ = ''f''(''x''₁). This consists of a rectangular region from (0, 0) to (''x''₁, ''y''₁), and the (typically infinite) tail of the distribution, where ''x'' > ''x''₁ (and ''y'' < ''y''₁). This layer (call it layer 0) has area ''A''. On top of this, add a rectangular layer of width ''x''₁ and height ''A''/''x''₁, so it also has area ''A''. The top of this layer is at height ''y''₂ = ''y''₁ + ''A''/''x''₁, and intersects the density function at a point (''x''₂, ''y''₂), where ''y''₂ = ''f''(''x''₂). This layer includes every point in the density function between ''y''₁ and ''y''₂, but (unlike the base layer) also includes points such as (''x''₁, ''y''₂) which are not in the desired distribution. Further layers are then stacked on top. To use a precomputed table of size ''n'' (''n'' = 256 is typical), one chooses ''x''₁ such that ''x_n'' = 0, meaning that the top box, layer ''n'' − 1, reaches the distribution's peak at (0, ''f''(0)) exactly. Layer ''i'' extends vertically from ''y_i'' to ''y''_''i'' +1, and can be divided into two regions horizontally: the (generally larger) portion from 0 to ''x''_''i'' +1 which is entirely contained within the desired distribution, and the (small) portion from ''x''_''i'' +1 to ''x_i'', which is only partially contained. Ignoring for a moment the problem of layer 0, and given uniform random variables ''U''₀ and ''U''₁ ∈ [0,1), the ziggurat algorithm can be described as: # Choose a random layer 0 ≤ ''i'' < ''n''. # Let ''x'' = ''U''₀''x_i''. # If ''x'' < ''x''_''i'' +1, return ''x''. # Let ''y'' = ''y_i'' + ''U''₁(''y''_''i'' +1 − ''y_i''). # Compute ''f''(''x''). If ''y'' < ''f''(''x''), return ''x''. # Otherwise, choose new random numbers and go back to step 1. Step 1 amounts to choosing a low-resolution ''y'' coordinate. Step 3 tests if the ''x'' coordinate is clearly within the desired density function without knowing more about the y coordinate. If it is not, step 4 chooses a high-resolution y coordinate, and step 5 does the rejection test. With closely spaced layers, the algorithm terminates at step 3 a very large fraction of the time. For the top layer ''n'' − 1, however, this test always fails, because ''x_n'' = 0. Layer 0 can also be divided into a central region and an edge, but the edge is an infinite tail. To use the same algorithm to check if the point is in the central region, generate a fictitious ''x''₀ = ''A''/''y''₁. This will generate points with ''x'' < ''x''₁ with the correct frequency, and in the rare case that layer 0 is selected and ''x'' ≥ ''x''₁, use a special fallback algorithm to select a point at random from the tail. Because the fallback algorithm is used less than one time in a thousand, speed is not essential. Thus, the full ziggurat algorithm for one-sided distributions is: # Choose a random layer 0 ≤ ''i'' < ''n''. # Let ''x'' = ''U''₀''x_i'' # If ''x'' < ''x''_''i'' +1, return ''x''. # If ''i'' = 0, generate a point from the tail using the fallback algorithm. # Let ''y'' = ''y_i'' + ''U''₁(''y''_''i'' +1 − ''y_i''). # Compute ''f''(''x''). If ''y'' < ''f''(''x''), return ''x''. # Otherwise, choose new random numbers and go back to step 1. For a two-sided distribution, the result must be negated 50% of the time. This can often be done conveniently by choosing ''U''₀ ∈ (−1,1) and, in step 3, testing if , ''x'', < ''x''_''i'' +1.

Fallback algorithms for the tail

Because the ziggurat algorithm only generates ''most'' outputs very rapidly, and requires a fallback algorithm whenever ''x'' > ''x''₁, it is always more complex than a more direct implementation. The specific fallback algorithm depends on the distribution. For an exponential distribution, the tail looks just like the body of the distribution. One way is to fall back to the most elementary algorithm ''E'' = −ln(''U''₁) and let ''x'' = ''x''₁ − ln(''U''₁). Another is to call the ziggurat algorithm recursion, recursively and add ''x''₁ to the result. For a normal distribution, Marsaglia suggests a compact algorithm: # Let ''x'' = −ln(''U''₁)/''x''₁. # Let ''y'' = −ln(''U''₂). # If 2''y'' > ''x''², return ''x'' + ''x''₁. # Otherwise, go back to step 1. Since ''x''₁ ≈ 3.5 for typical table sizes, the test in step 3 is almost always successful. Since −ln(''U''₁) is an exponentially distributed variate, an implementation of the exponential distribution may be used.

Optimizations

The algorithm can be performed efficiently with precomputed tables of ''x_i'' and ''y_i'' = ''f''(''x_i''), but there are some modifications to make it even faster: * Nothing in the ziggurat algorithm depends on the probability distribution function being normalized (integral under the curve equal to 1), removing

normalizing constant In probability theory, a normalizing constant or normalizing factor is used to reduce any probability function to a probability density function with total probability of one. For example, a Gaussian function can be normalized into a probabilit ...

s can speed up the computation of ''f''(''x''). * Most uniform random number generators are based on integer random number generators which return an integer in the range , 2³² − 1 A table of 2⁻³²''x''_i lets you use such numbers directly for ''U''₀. * When computing two-sided distributions using a two-sided ''U''₀ as described earlier, the random integer can be interpreted as a signed number in the range ��2³¹, 2³¹ − 1 and a scale factor of 2⁻³¹ can be used. * Rather than comparing ''U''₀''x_i'' to ''x''_''i'' +1 in step 3, it is possible to precompute ''x''_''i'' +1/''x_i'' and compare ''U''₀ with that directly. If ''U''₀ is an integer random number generator, these limits may be premultiplied by 2³² (or 2³¹, as appropriate) so an integer comparison can be used. * With the above two changes, the table of unmodified ''x_i'' values is no longer needed and may be deleted. * When generating

IEEE 754 The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic originally established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard #Design rationale, add ...

single-precision floating point values, which only have a 24-bit mantissa (including the implicit leading 1), the least-significant bits of a 32-bit integer random number are not used. These bits may be used to select the layer number. (See the references below for a detailed discussion of this.) * The first three steps may be put into an

inline function In the C (programming language), C and C++ programming languages, an inline function is one qualified with the Keyword (computer programming), keyword inline; this serves two purposes: # It serves as a compiler directive that suggests (but doe ...

, which can call an out-of-line implementation of the less frequently needed steps.

Generating the tables

It is possible to store the entire table precomputed, or just include the values ''n'', ''y''₁, ''A'', and an implementation of ''f''⁻¹(''y'') in the source code, and compute the remaining values when initializing the random number generator. As previously described, you can find ''x_i'' = ''f''⁻¹(''y_i'') and ''y''_''i'' +1 = ''y_i'' + ''A''/''x_i''. Repeat ''n'' − 1 times for the layers of the ziggurat. At the end, you should have ''y_n'' = ''f''(0). There will be some

round-off error In computing, a roundoff error, also called rounding error, is the difference between the result produced by a given algorithm using exact arithmetic and the result produced by the same algorithm using finite-precision, rounded arithmetic. Roun ...

, but it is a useful sanity test to see that it is acceptably small. When actually filling in the table values, just assume that ''x_n'' = 0 and ''y_n'' = ''f''(0), and accept the slight difference in layer ''n'' − 1's area as rounding error.

Finding ''x''₁ and ''A''

Given an initial (guess at) ''x''₁, you need a way to compute the area ''t'' of the tail for which ''x'' > ''x''₁. For the exponential distribution, this is just ''e''^−''x''₁, while for the normal distribution, assuming you are using the unnormalized ''f''(''x'') = ''e''^−''x''²/2, this is erfc(''x''/). For more awkward distributions,

numerical integration In analysis, numerical integration comprises a broad family of algorithms for calculating the numerical value of a definite integral. The term numerical quadrature (often abbreviated to quadrature) is more or less a synonym for "numerical integr ...

may be required. With this in hand, from ''x''₁, you can find ''y''₁ = ''f''(''x''₁), the area ''t'' in the tail, and the area of the base layer ''A'' = ''x''₁''y''₁ + ''t''. Then compute the series ''y_i'' and ''x_i'' as above. If ''y_i'' > ''f''(0) for any ''i'' < ''n'', then the initial estimate ''x''₁ was too low, leading to too large an area ''A''. If ''y_n'' < ''f''(0), then the initial estimate ''x''₁ was too high. Given this, use a

root-finding algorithm In numerical analysis, a root-finding algorithm is an algorithm for finding zeros, also called "roots", of continuous functions. A zero of a function is a number such that . As, generally, the zeros of a function cannot be computed exactly nor ...

(such as the

bisection method In mathematics, the bisection method is a root-finding method that applies to any continuous function for which one knows two values with opposite signs. The method consists of repeatedly bisecting the interval defined by these values and t ...

) to find the value ''x''₁ which produces ''y''_''n''−1 as close to ''f''(0) as possible. Alternatively, look for the value which makes the area of the topmost layer, ''x''_''n''−1(''f''(0) − ''y''_''n''−1), as close to the desired value ''A'' as possible. This saves one evaluation of ''f''⁻¹(''x'') and is actually the condition of greatest interest.

McFarland's variation

Christopher D. McFarland has proposed a further-optimized version. This applies three algorithmic changes, at the expense of slightly larger tables. First, the common case considers only the rectangular portions, from (0, ''y''_''i'' −1) to (''x_i'', ''y_i'') The odd-shaped regions to the right of these (mostly almost triangular, plus the tail) are handled separately. This simplifies and speeds up the algorithm's

fast path Fast path is a term used in computer science to describe a path with shorter instruction path length through a program compared to the normal path. For a fast path to be effective it must handle the most commonly occurring tasks more efficiently tha ...

. Second, the exact area of the odd-shaped regions is used; they are ''not'' rounded up to include the entire rectangle to (''x''_''i'' −1, ''y_i''). This increases the probability that the fast path will be used. One major consequence of this is that the number of layers is slightly less than ''n''. Even though the area of the odd-shaped portions is taken exactly, the total adds up to more than one layer's worth. The area per layer is adjusted so that the number of rectangular layers is an integer. If the initial 0 ≤ ''i'' < ''n'' exceeds the number of rectangular layers, phase 2 proceeds. If the value sought lies in any of the odd-shaped regions, the alias method is used to choose one, based on its true area. This is a small amount of additional work, and requires additional alias tables, but chooses one of the layers' right-hand sides. The chosen odd-shaped region is rejection sampled, but if a sample is rejected, the algorithm does ''not'' return to the beginning. The true area of each odd-shaped region was used to choose a layer, so the rejection-sampling loop stays in that layer until a point is chosen. Third, the almost-triangular shape of most odd-shaped portions is exploited, although this must be divided into three cases depending on the

second derivative In calculus, the second derivative, or the second-order derivative, of a function is the derivative of the derivative of . Informally, the second derivative can be phrased as "the rate of change of the rate of change"; for example, the secon ...

of the probability distribution function in the selected layer. If the function is convex (as the exponential distribution is everywhere, and the normal distribution is for > 1), then the function is strictly contained within the lower triangle. Two unit uniform deviates ''U''₁ and ''U''₂ are chosen, and before they are scaled to the rectangle enclosing the odd-shaped region, their sum is tested. If ''U''₁ + ''U''₂ > 1, the point is in the upper triangle and can be reflected to (1−''U''₁, 1−''U''₂). Then, if ''U''₁ + ''U''₂ < 1−''ε'', for some suitable tolerance ''ε'', the point is definitely below the curve and can immediately be accepted. Only for points very close to the diagonal is it necessary to compute the distribution function ''f''(''x'') to perform an exact rejection test. (The tolerance ''ε'' should in theory depend on the layer, but a single maximum value can be used on all layers with little loss.) If the function is concave (as the normal distribution is for < 1), it includes a small portion of the upper triangle so reflection is impossible, but points whose normalized coordinates satisfy can be immediately accepted, and points for which can be immediately rejected. In the one layer which straddles = 1, the normal distribution has an inflection point, and the exact rejection test must be applied if 1−''ε'' <''U''₁ + ''U''₂ < 1+''ε''. The tail is handled as in the original Ziggurat algorithm, and can be thought of as a fourth case for the shape of the odd-shaped region to the right.

References

* This paper numbers the layers from 1 starting at the top, and makes layer 0 at the bottom a special case, while the explanation above numbers layers from 0 at the bottom.
C implementation of the ziggurat method for the normal density function and the exponential density function
that is essentially a copy of the code in the paper. (Potential users should be aware that this C code assumes 32-bit integers.)

of the ziggurat algorithm and overview of the method. * Describes the hazards of using the least-significant bits of the integer random number generator to choose the layer number.

By Cleve Moler, MathWorks, describing the ziggurat algorithm introduced in

MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...

version 5, 2001.
The Ziggurat Random Normal Generator
Blogs of MathWorks, posted by Cleve Moler, May 18, 2015. * Comparison of several algorithms for generating Gaussian random numbers. * . Illustrates problems with underlying uniform pseudo-random number generators and how those problems affect the ziggurat algorithm's output. * * {{DEFAULTSORT:Ziggurat Algorithm Pseudorandom number generators Non-uniform random numbers Statistical algorithms