History
An initial battery of randomness tests for RNGs was suggested in the 1969 first edition of ''The Art of Computer Programming'' by Donald Knuth (Volume 2, Chapter 3.3: Statistical tests). Knuth's tests were then supplanted by George Marsaglia's Diehard tests (1995) consisting of fifteen different tests. The inability to modify the test parameters or add new tests led to the development of the TestU01 library, introduced in 2007 by Pierre L’Ecuyer and Richard Simard of theBirthday spacings
Choose random points on a large interval. The spacings between the points should be asymptotically exponentially distributed. The name is based on theOverlapping permutations
Analyze sequences of five consecutive random numbers. The 120 possible orderings should occur with statistically equal probability. This is the OPERM5 test. It looks at a sequence of one million 32-bit random integers. Each set of five consecutive integers can be in one of 120 states, for the 5! possible orderings of five numbers. Thus the 5th, 6th, 7th, ... numbers each provide a state. As many thousands of state transitions are observed, cumulative counts are made of the number of occurrences of each state. Then theRanks of matrices
Select some number of bits from some number of random numbers to form a31×31
The leftmost 31 bits of 31 random integers from the test sequence are used to form a 31×31 binary matrix over the field . The rank is determined. That rank can be from 0 to 31, but ranks < 28 are rare, and their counts are pooled with those for rank 28. Ranks are found for 40000 such random matrices and a chi-square test is performed on counts for ranks 31, 30, 29 and ≤ 28.32×32
A random 32×32 binary matrix is formed, each row a 32-bit random6×8
From each of six random 32-bit integers from the generator under test, a specified byte is chosen, and the resulting six bytes form a 6×8 binary matrix whose rank is determined. That rank can be from 0 to 6, but ranks 0, 1, 2, 3 are rare; their counts are pooled with those for rank 4. Ranks are found for 100000 random matrices, and a chi square test is performed on counts for ranks 6, 5 and ≤ 4.Monkey tests
Treat sequences of some number of bits as "words". Count the overlapping words in a stream. The number of "words" that do not appear should follow a known distribution. The name is based on theCount the 1s
Count the 1 bits in each of either successive or chosen bytes. Convert the counts to "letters", and count the occurrences of five-letter "words".Successive bytes
Consider the file under test as a stream of bytes (four per 32-bit integer). Each byte can contain from none to eight 1s, with probabilities 1, 8, 28, 56, 70, 56, 28, 8, 1 over 256. Now let the stream of bytes provide a string of overlapping 5-letter words, each "letter" taking values A, B, C, D, E. The letters are determined by the number of 1s in a byte 0, 1, or 2 yield A, 3 yields B, 4 yields C, 5 yields D and 6, 7 or 8 yield E. Thus we have a monkey at a typewriter hitting five keys with various probabilities (37, 56, 70, 56, 37 over 256). There are 55 possible 5-letter words, and from a string of 256000 (overlapping) 5-letter words, counts are made on the frequencies for each word. The quadratic form in the weak inverse of the covariance matrix of the cell counts provides a chisquare test Q5–Q4, the difference of the naive Pearson sums of on counts for 5- and 4-letter cell counts.Chosen bytes
Consider the file under test as a stream of 32-bit integers. From each integer, a specific byte is chosen, say the leftmost bits 1 to 8. Each byte can contain from 0 to 8 1s, with probabilities 1, 8, 28, 56, 70, 56, 28, 8, 1 over 256. Now let the specified bytes from successive integers provide a string of (overlapping) 5-letter words, each "letter" taking values A, B, C, D, E. The letters are determined by the number of 1s, in that byte 0, 1, or 2 → A, 3 → B, 4 → C, 5 → D, and 6, 7 or 8 → E. Thus we have a monkey at a typewriter hitting five keys with various probabilities 37, 56, 70, 56, 37 over 256. There are 5 possible 5-letter words, and from a string of 256000 (overlapping) 5-letter words, counts are made on the frequencies for each word. The quadratic form in the weak inverse of the covariance matrix of the cell counts provides a chisquare test Q5 – Q4, the difference of the naive Pearson sums of on counts for 5- and 4-letter cell counts.Parking lot test
Randomly place unit circles in a 100×100 square. A circle is successfully parked if it does not overlap an existing successfully parked one. After 12,000 tries, the number of successfully parked circles should follow a certainMinimum distance test
Randomly place 8000 points in a 10000×10000 square, then find the minimum distance between the pairs. The square of this distance should be exponentially distributed with a certain mean. It does this 100 times choose ''n'' = 8000 random points in a square of side 10000. Find ''d'', the minimum distance between the pairs of points. If the points are truly independent uniform, then , the square of the minimum distance should be (very close to) exponentially distributed with mean 0.995. Thus should be uniform on [0,1) and a KSTEST on the resulting 100 values serves as a test of uniformity for random points in the square. Test numbers = 0 mod 5 are printed but the KSTEST is based on the full set of 100 random choices of 8000 points in the 10000×10000 square.Random spheres test
Randomly choose 4000 points in a cube of edge 1000. Center a sphere on each point, whose radius is the minimum distance to another point. The smallest sphere's volume should be exponentially distributed with a certain mean. Choose 4000 random points in a cube of edge 1000. At each point, center a sphere large enough to reach the next closest point. Then the volume of the smallest such sphere is (very close to) exponentially distributed with mean 120 / 3. Thus the radius cubed is exponential with mean 30. (The mean is obtained by extensive simulation). The 3D spheres test generates 4000 such spheres 20 times. Each min radius cubed leads to a uniform variable by means of , then a KSTEST is done on the 20 ''p''-values.Squeeze test
Multiply 231 by random floats on until you reach 1. Repeat this 100000 times. The number of floats needed to reach 1 should follow a certain distribution. Random integers are floated to get uniforms on [0,1). Starting with ''k'' = 2 = 2147483648, the test finds , the number of iterations necessary to reduce ''k'' to 1, using the reduction ''k'' = ceiling(''k''×''U''), with ''U'' provided by floating integers from the file being tested. Such s are found 100000 times, then counts for the number of times was are used to provide a chi-square test for cell frequencies.Overlapping sums test
Generate a long sequence of random floats on . Add sequences of 100 consecutive floats. The sums should be normally distributed with characteristic mean and variance. Integers are floated to get a sequence ''U''(1), ''U''(2), ... of uniform [0,1) variables. Then overlapping sums, ''S''(1) = ''U''(1) + ... + ''U''(100), ''S''(2) = ''U''(2) + ... + ''U''(101), ... are formed. The ''S''s are virtually normal with a certain covariance matrix. A linear transformation of the ''S''s converts them to a sequence of independent standard normals, which are converted to uniform variables for a KSTEST. The ''p''-values from ten KSTESTs are given still another KSTEST.Wald–Wolfowitz runs test
Generate a long sequence of random floats on . Count ascending and descending runs. The counts should follow a certain distribution. It counts runs up, and runs down, in a sequence of uniform [0,1) variables, obtained by floating the 32-bit integers in the specified file. This example shows how runs are counted: 0.123, 0.357, 0.789, 0.425, 0.224, 0.416, 0.95 contains an up-run of length 3, a down-run of length 2 and an up-run of (at least) 2, depending on the next values. The covariance matrices for the runs-up and runs-down are well known, leading to chi-square tests for quadratic forms in the weak inverses of the covariance matrices. Runs are counted for sequences of length 10000. This is done ten times. Then repeated.Craps test
Play 200000 games of craps, counting the wins and the number of throws per game. Each count should follow a certain distribution. It plays 200000 games of craps, finds the number of wins and the number of throws necessary to end each game. The number of wins should be (very close to) a normal with mean 200000''p'' and variance , with . Throws necessary to complete the game can vary from 1 to infinity, but counts for all > 21 are lumped with 21. A chi-square test is made on the no.-of-throws cell counts. Each 32-bit integer from the test file provides the value for the throw of a die, by floating to [0,1), multiplying by 6 and taking 1 plus the integer part of the result.Bitstream test
The file under test is viewed as a stream of bits. Call them b, b, ... . Consider an alphabet with two "letters", 0 and 1, and think of the stream of bits as a succession of 20-letter "words", overlapping. Thus the first word is bb...b, the second is bb...b, and so on. The bitstream test counts the number of missing 20-letter (20-bit) words in a string of 2 overlapping 20-letter words. There are 2 possible 20-letter words. For a truly random string of 2 + 19 bits, the number of missing words should be (very close to) normally distributed with mean 141,909 and sigma 428. Thus should be a standard normal variate (''z'' score) that leads to a uniform [0,1) ''p'' value. The test is repeated twenty times.OPSO, OQSO and DNA tests
OPSO means overlapping-pairs-sparse-occupancy. The OPSO test considers 2-letter words from an alphabet of 1024 letters. Each letter is determined by a specified ten bits from a 32-bit integer in the sequence to be tested. OPSO generates 2 (overlapping) 2-letter words (from 2 + 1 "keystrokes") and counts the number of missing words – that is 2-letter words which do not appear in the entire sequence. That count should be very close to normally distributed with mean 141909, sigma 290. Thus (missingwrds-141909) / 290 should be a standard normal variable. The OPSO test takes 32 bits at a time from the test file and uses a designated set of ten consecutive bits. It then restarts the file for the next designated 10 bits, and so on. OQSO means overlapping-quadruples-sparse-occupancy. The test OQSO is similar, except that it considers 4-letter words from an alphabet of 32 letters, each letter determined by a designated string of 5 consecutive bits from the test file, elements of which are assumed 32-bit random integers. The mean number of missing words in a sequence of 2 four-letter words, (2 + 3 "keystrokes"), is again 141909, with sigma = 295. The mean is based on theory; sigma comes from extensive simulation. The DNA test considers an alphabet of 4 letters C, G, A, T, determined by two designated bits in the sequence of random integers being tested. It considers 10-letter words, so that as in OPSO and OQSO, there are 2 possible words, and the mean number of missing words from a string of 2 (overlapping) 10-letter words (2 + 9 "keystrokes") is 141909. The standard deviation sigma = 339 was determined as for OQSO by simulation. (Sigma for OPSO, 290, is the true value (to three places), not determined by simulation.References
Further reading
* * *External links
* {{cite web , url = http://stat.fsu.edu/pub/diehard/ , title = The Marsaglia Random Number CDROM including the Diehard Battery of Tests of Randomness , year = 1995 , publisher = Florida State University , archiveurl = https://web.archive.org/web/20160125103112/http://stat.fsu.edu/pub/diehard/ , archivedate = 2016-01-25 , url-status = dead Random number generation