The Wald–Wolfowitz runs test (or simply runs test), named after statisticians
Abraham Wald
Abraham Wald (; ; , ; – ) was a Hungarian and American mathematician and statistician who contributed to decision theory, geometry and econometrics, and founded the field of sequential analysis. One of his well-known statistical works was ...
and
Jacob Wolfowitz is a
non-parametric
Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the underlying distribution of the data being studied. Often these models are infinite-dimensional, rather than finite dimensional, as in parametric sta ...
statistical test that checks a randomness hypothesis for a two-valued
data sequence. More precisely, it can be used to
test the hypothesis that the elements of the sequence are mutually
independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in Pennsylvania, United States
* Independentes (English: Independents), a Portuguese artist ...
.
Definition
A ''run'' of a sequence is a maximal non-empty segment of the sequence consisting of adjacent equal elements. For example, the 21-element-long sequence
: + + + + − − − + + + − + + + + + + − − − −
consists of 6 runs, with lengths 4, 3, 3, 1, 6, and 4. The run test is based on the
null hypothesis
The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
that each element in the sequence is independently drawn from the same distribution.
Under the null hypothesis, the number of runs in a sequence of ''N'' elements
[''N'' is the number of elements, not the number of runs.] is a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
whose
conditional distribution
Conditional (if then) may refer to:
* Causal conditional, if X then Y, where X is a cause of Y
*Conditional probability, the probability of an event A given that another event B
* Conditional proof, in logic: a proof that asserts a conditional, ...
given the observation of ''N''
+ positive values
[''N''+ is the number of elements with positive values, not the number of positive runs] and ''N''
− negative values () is approximately normal, with:
:
Equivalently, the number of runs is
.
These parameters do not assume that the positive and negative elements have equal probabilities of occurring, but only assume that the elements are
independent and identically distributed
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in Pennsylvania, United States
* Independentes (English: Independents), a Portuguese artist ...
. If the number of runs is
significantly higher or lower than expected, the hypothesis of statistical independence of the elements may be rejected.
Proofs
Moments
The number of runs is
. By independence, the expectation is
Writing out all possibilities, we find
Thus,
.
Now simplify the expression to get
.
Similarly, the variance of the number of runs is
and simplifying, we obtain the variance.
Similarly we can calculate all moments of
, but the algebra becomes uglier and uglier.
Asymptotic normality
Theorem. If we sample longer and longer sequences, with
for some fixed
, then
converges in distribution to the normal distribution with mean 0 and variance 1.
Proof sketch. It suffices to prove the asymptotic normality of the sequence
, which can be proven by a
martingale central limit theorem
Martingale may refer to:
*Martingale (probability theory), a stochastic process in which the conditional expectation of the next value, given the current and preceding values, is the current value
* Martingale (tack) for horses
* Martingale (colla ...
.
Applications
Runs tests can be used to test:
#the randomness of a distribution, by taking the data in the given order and marking with + the data greater than the
median
The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...
, and with – the data less than the median (numbers equalling the median are omitted.)
#whether a function fits well to a
data set
A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...
, by marking the data exceeding the function value with + and the other data with −. For this use, the runs test, which takes into account the signs but not the distances, is complementary to the
chi square test, which takes into account the distances but not the signs.
Related tests
The
Kolmogorov–Smirnov test
In statistics, the Kolmogorov–Smirnov test (also K–S test or KS test) is a nonparametric statistics, nonparametric test of the equality of continuous (or discontinuous, see #Discrete and mixed null distribution, Section 2.2), one-dimensional ...
has been shown to be more powerful than the Wald–Wolfowitz test for detecting differences between distributions that differ solely in their location. However, the reverse is true if the distributions differ in variance and have at the most only a small difference in location.
The Wald–Wolfowitz runs test has been extended for use with several
samples.
Notes
References
External links
NCSS Analysis of Runs
{{DEFAULTSORT:Wald-Wolfowitz Runs Test
Statistical tests
Nonparametric statistics
Independence (probability theory)