In
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
and
statistics, the index of dispersion, dispersion index, coefficient of dispersion, relative variance, or variance-to-mean ratio (VMR), like the
coefficient of variation
In probability theory and statistics, the coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed a ...
, is a
normalized measure of the
dispersion
Dispersion may refer to:
Economics and finance
*Dispersion (finance), a measure for the statistical distribution of portfolio returns
*Price dispersion, a variation in prices across sellers of the same item
*Wage dispersion, the amount of variatio ...
of a
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
: it is a measure used to quantify whether a set of observed occurrences are clustered or dispersed compared to a standard statistical model.
It is defined as the ratio of the
variance to the
mean ,''
:
It is also known as the
Fano factor In statistics, the Fano factor, like the coefficient of variation, is a measure of the dispersion of a probability distribution of a Fano noise. It is named after Ugo Fano, an Italian American physicist.
The Fano factor is defined as
:F=\frac,
...
, though this term is sometimes reserved for ''windowed'' data (the mean and variance are computed over a subpopulation), where the index of dispersion is used in the special case where the window is infinite. Windowing data is frequently done: the VMR is frequently computed over various intervals in time or small regions in space, which may be called "windows", and the resulting statistic called the Fano factor.
It is only defined when the mean
is non-zero, and is generally only used for positive statistics, such as
count data
Count (feminine: countess) is a historical title of nobility in certain European countries, varying in relative status, generally of middling rank in the hierarchy of nobility. Pine, L. G. ''Titles: How the King Became His Majesty''. New York: ...
or time between events, or where the underlying distribution is assumed to be the
exponential distribution
In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average ...
or
Poisson distribution.
Terminology
In this context, the observed dataset may consist of the times of occurrence of predefined events, such as earthquakes in a given region over a given magnitude, or of the locations in geographical space of plants of a given species. Details of such occurrences are first converted into counts of the numbers of events or occurrences in each of a set of equal-sized time- or space-regions.
The above defines a ''dispersion index for counts''. A different definition applies for a ''dispersion index for intervals'', where the quantities treated are the lengths of the time-intervals between the events. Common usage is that "index of dispersion" means the dispersion index for counts.
Interpretation
Some distributions, most notably the
Poisson distribution, have equal variance and mean, giving them a VMR = 1. The
geometric distribution and the
negative binomial distribution have VMR > 1, while the
binomial distribution
In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no questi ...
has VMR < 1, and the
constant random variable
In mathematics, a degenerate distribution is, according to some, a probability distribution in a space with support only on a manifold of lower dimension, and according to others a distribution with support only at a single point. By the latter d ...
has VMR = 0. This yields the following table:
This can be considered analogous to the classification of
conic sections
In mathematics, a conic section, quadratic curve or conic is a curve obtained as the intersection of the surface of a cone with a plane. The three types of conic section are the hyperbola, the parabola, and the ellipse; the circle is a speci ...
by
eccentricity; see
Cumulants of particular probability distributions for details.
The relevance of the index of dispersion is that it has a value of 1 when the probability distribution of the number of occurrences in an interval is a
Poisson distribution. Thus the measure can be used to assess whether observed data can be modeled using a
Poisson process. When the coefficient of dispersion is less than 1, a dataset is said to be "under-dispersed": this condition can relate to patterns of occurrence that are more regular than the randomness associated with a Poisson process. For instance, regular, periodic events will be under-dispersed. If the index of dispersion is larger than 1, a dataset is said to be
over-dispersed.
A sample-based estimate of the dispersion index can be used to construct a formal
statistical hypothesis test for the adequacy of the model that a series of counts follow a Poisson distribution. In terms of the interval-counts, over-dispersion corresponds to there being more intervals with low counts and more intervals with high counts, compared to a Poisson distribution: in contrast, under-dispersion is characterised by there being more intervals having counts close to the mean count, compared to a Poisson distribution.
The VMR is also a good measure of the degree of randomness of a given phenomenon. For example, this technique is commonly used in currency management.
Example
For randomly diffusing particles (
Brownian motion
Brownian motion, or pedesis (from grc, πήδησις "leaping"), is the random motion of particles suspended in a medium (a liquid or a gas).
This pattern of motion typically consists of random fluctuations in a particle's position ins ...
), the distribution of the number of particle inside a given volume is poissonian, i.e. VMR=1. Therefore, to assess if a given spatial pattern (assuming you have a way to measure it) is due purely to diffusion or if some particle-particle interaction is involved : divide the space into patches, Quadrats or Sample Units (SU), count the number of individuals in each patch or SU, and compute the VMR. VMRs significantly higher than 1 denote a clustered distribution, where
random walk is not enough to smother the attractive inter-particle potential.
History
The first to discuss the use of a test to detect deviations from a Poisson or binomial distribution appears to have been Lexis in 1877. One of the tests he developed was the
Lexis ratio.
This index was first used in botany by
Clapham
Clapham () is a suburb in south west London, England, lying mostly within the London Borough of Lambeth, but with some areas (most notably Clapham Common) extending into the neighbouring London Borough of Wandsworth.
History
Early history
...
in 1936.
If the variates are Poisson distributed then the index of dispersion is distributed as a χ
2 statistic with ''n'' - 1 degrees of freedom when ''n'' is large and is ''μ'' > 3.
For many cases of interest this approximation is accurate and Fisher in 1950 derived an exact test for it.
Hoel
King Hoel ( br, Hoel I Mawr, "Hoel the Great"; la, Hoelus, Hovelus, Hœlus), also known as Sir Howel, Saint Hywel and Hywel the Great, was a late 5th- and early 6th-centuryFord, David Nashat ''Early British Kingdoms''. 2001. Retrieved 1 D ...
studied the first four moments of its distribution.
He found that the approximation to the χ
2 statistic is reasonable if ''μ'' > 5.
Skewed distributions
For highly skewed distributions, it may be more appropriate to use a linear loss function, as opposed to a quadratic one. The analogous coefficient of dispersion in this case is the ratio of the average absolute deviation from the median to the median of the data,
or, in symbols:
:
where ''n'' is the sample size, ''m'' is the sample median and the sum taken over the whole sample.
Iowa,
New York
New York most commonly refers to:
* New York City, the most populous city in the United States, located in the state of New York
* New York (state), a state in the northeastern United States
New York may also refer to:
Film and television
* ...
and
South Dakota use this linear coefficient of dispersion to estimate dues taxes.
For a two-sample test in which the sample sizes are large, both samples have the same median, and differ in the dispersion around it, a confidence interval for the linear coefficient of dispersion is bounded inferiorly by
:
where ''t''
j is the mean absolute deviation of the ''j''
th sample and ''z
α'' is the confidence interval length for a normal distribution of confidence ''α'' (e.g., for ''α'' = 0.05, ''z
α'' = 1.96).
See also
*
Count data
Count (feminine: countess) is a historical title of nobility in certain European countries, varying in relative status, generally of middling rank in the hierarchy of nobility. Pine, L. G. ''Titles: How the King Became His Majesty''. New York: ...
*
Harmonic mean
Similar ratios
*
Coefficient of variation
In probability theory and statistics, the coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed a ...
,
*
Standardized moment,
*
Fano factor In statistics, the Fano factor, like the coefficient of variation, is a measure of the dispersion of a probability distribution of a Fano noise. It is named after Ugo Fano, an Italian American physicist.
The Fano factor is defined as
:F=\frac,
...
,
(windowed VMR)
*
signal-to-noise ratio
Signal-to-noise ratio (SNR or S/N) is a measure used in science and engineering that compares the level of a desired signal to the level of background noise. SNR is defined as the ratio of signal power to the noise power, often expressed in de ...
,
(in
signal processing
Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as sound, images, and scientific measurements. Signal processing techniques are used to optimize transmissions, d ...
)
Notes
References
*
*
{{Statistics, descriptive
Statistical deviation and dispersion
Statistical ratios
Statistical randomness
Point processes