HOME

TheInfoList



OR:

In
probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
and
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the index of dispersion, dispersion index, coefficient of dispersion, relative variance, or variance-to-mean ratio (VMR), like the
coefficient of variation In probability theory and statistics, the coefficient of variation (CV), also known as normalized root-mean-square deviation (NRMSD), percent RMS, and relative standard deviation (RSD), is a standardized measure of dispersion of a probability ...
, is a normalized measure of the dispersion of a
probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
: it is a measure used to quantify whether a set of observed occurrences are clustered or dispersed compared to a standard statistical model. It is defined as the ratio of the
variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
\sigma^2 to the
mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
\mu, :D = . It is also known as the Fano factor, though this term is sometimes reserved for ''windowed'' data (the mean and variance are computed over a subpopulation), where the index of dispersion is used in the special case where the window is infinite. Windowing data is frequently done: the VMR is frequently computed over various intervals in time or small regions in space, which may be called "windows", and the resulting statistic called the Fano factor. It is only defined when the mean \mu is non-zero, and is generally only used for positive statistics, such as
count data Count (feminine: countess) is a historical title of nobility in certain European countries, varying in relative status, generally of middling rank in the hierarchy of nobility. Pine, L. G. ''Titles: How the King Became His Majesty''. New York: ...
or time between events, or where the underlying distribution is assumed to be the
exponential distribution In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuousl ...
or
Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
.


Terminology

In this context, the observed dataset may consist of the times of occurrence of predefined events, such as earthquakes in a given region over a given magnitude, or of the locations in geographical space of plants of a given species. Details of such occurrences are first converted into counts of the numbers of events or occurrences in each of a set of equal-sized time- or space-regions. The above defines a ''dispersion index for counts''. A different definition applies for a ''dispersion index for intervals'', where the quantities treated are the lengths of the time-intervals between the events. Common usage is that "index of dispersion" means the dispersion index for counts.


Interpretation

Some distributions, most notably the
Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
, have equal variance and mean, giving them a VMR = 1. The
geometric distribution In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions: * The probability distribution of the number X of Bernoulli trials needed to get one success, supported on \mathbb = \; * T ...
and the
negative binomial distribution In probability theory and statistics, the negative binomial distribution, also called a Pascal distribution, is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Berno ...
have VMR > 1, while the
binomial distribution In probability theory and statistics, the binomial distribution with parameters and is the discrete probability distribution of the number of successes in a sequence of statistical independence, independent experiment (probability theory) ...
has VMR < 1, and the
constant random variable In probability theory, a degenerate distribution on a measure space (E, \mathcal, \mu) is a probability distribution whose support is a null set with respect to \mu. For instance, in the -dimensional space endowed with the Lebesgue measure, an ...
has VMR = 0. This yields the following table: This can be considered analogous to the classification of
conic sections A conic section, conic or a quadratic curve is a curve obtained from a Conical surface, cone's surface intersecting a plane (mathematics), plane. The three types of conic section are the hyperbola, the parabola, and the ellipse; the circle is ...
by
eccentricity Eccentricity or eccentric may refer to: * Eccentricity (behavior), odd behavior on the part of a person, as opposed to being "normal" Mathematics, science and technology Mathematics * Off-Centre (geometry), center, in geometry * Eccentricity (g ...
; see Cumulants of particular probability distributions for details. The relevance of the index of dispersion is that it has a value of 1 when the probability distribution of the number of occurrences in an interval is a
Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...
. Thus the measure can be used to assess whether observed data can be modeled using a
Poisson process In probability theory, statistics and related fields, a Poisson point process (also known as: Poisson random measure, Poisson random point field and Poisson point field) is a type of mathematical object that consists of Point (geometry), points ...
. When the coefficient of dispersion is less than 1, a dataset is said to be "under-dispersed": this condition can relate to patterns of occurrence that are more regular than the randomness associated with a Poisson process. For instance, regular, periodic events will be under-dispersed. If the index of dispersion is larger than 1, a dataset is said to be over-dispersed. A sample-based estimate of the dispersion index can be used to construct a formal statistical hypothesis test for the adequacy of the model that a series of counts follow a Poisson distribution. In terms of the interval-counts, over-dispersion corresponds to there being more intervals with low counts and more intervals with high counts, compared to a Poisson distribution: in contrast, under-dispersion is characterised by there being more intervals having counts close to the mean count, compared to a Poisson distribution. The VMR is also a good measure of the degree of randomness of a given phenomenon. For example, this technique is commonly used in currency management.


Example

For randomly diffusing particles (
Brownian motion Brownian motion is the random motion of particles suspended in a medium (a liquid or a gas). The traditional mathematical formulation of Brownian motion is that of the Wiener process, which is often called Brownian motion, even in mathematical ...
), the distribution of the number of particle inside a given volume is poissonian, i.e. VMR=1. Therefore, to assess if a given spatial pattern (assuming you have a way to measure it) is due purely to diffusion or if some particle-particle interaction is involved : divide the space into patches, Quadrats or Sample Units (SU), count the number of individuals in each patch or SU, and compute the VMR. VMRs significantly higher than 1 denote a clustered distribution, where
random walk In mathematics, a random walk, sometimes known as a drunkard's walk, is a stochastic process that describes a path that consists of a succession of random steps on some Space (mathematics), mathematical space. An elementary example of a rand ...
is not enough to smother the attractive inter-particle potential.


History

The first to discuss the use of a test to detect deviations from a Poisson or binomial distribution appears to have been Lexis in 1877. One of the tests he developed was the
Lexis ratio The Lexis ratioLexis W (1877) Zur Theorie Der Massenerscheinungen in Der Menschlichen Gesellschaft. is used in statistics as a measure which seeks to evaluate differences between the statistical properties of random mechanisms where the outcome is ...
. This index was first used in botany by
Clapham Clapham () is a district in south London, south west London, England, lying mostly within the London Borough of Lambeth, but with some areas (including Clapham Common) extending into the neighbouring London Borough of Wandsworth. History Ea ...
in 1936. Hoel studied the first four moments of its distribution. He found that the approximation to the χ2 statistic is reasonable if ''μ'' > 5.


Skewed distributions

For highly skewed distributions, it may be more appropriate to use a linear loss function, as opposed to a quadratic one. The analogous coefficient of dispersion in this case is the ratio of the average absolute deviation from the median to the median of the data, or, in symbols: : CD = \frac\frac where ''n'' is the sample size, ''m'' is the sample median and the sum taken over the whole sample.
Iowa Iowa ( ) is a U.S. state, state in the upper Midwestern United States, Midwestern region of the United States. It borders the Mississippi River to the east and the Missouri River and Big Sioux River to the west; Wisconsin to the northeast, Ill ...
,
New York New York most commonly refers to: * New York (state), a state in the northeastern United States * New York City, the most populous city in the United States, located in the state of New York New York may also refer to: Places United Kingdom * ...
and
South Dakota South Dakota (; Sioux language, Sioux: , ) is a U.S. state, state in the West North Central states, North Central region of the United States. It is also part of the Great Plains. South Dakota is named after the Dakota people, Dakota Sioux ...
use this linear coefficient of dispersion to estimate dues taxes. For a two-sample test in which the sample sizes are large, both samples have the same median, and differ in the dispersion around it, a confidence interval for the linear coefficient of dispersion is bounded inferiorly by : \frac\exp where ''t''j is the mean absolute deviation of the ''j''th sample and ''zα'' is the confidence interval length for a normal distribution of confidence ''α'' (e.g., for ''α'' = 0.05, ''zα'' = 1.96).


See also

*
Count data Count (feminine: countess) is a historical title of nobility in certain European countries, varying in relative status, generally of middling rank in the hierarchy of nobility. Pine, L. G. ''Titles: How the King Became His Majesty''. New York: ...
*
Harmonic mean In mathematics, the harmonic mean is a kind of average, one of the Pythagorean means. It is the most appropriate average for ratios and rate (mathematics), rates such as speeds, and is normally only used for positive arguments. The harmonic mean ...


Similar ratios

*
Coefficient of variation In probability theory and statistics, the coefficient of variation (CV), also known as normalized root-mean-square deviation (NRMSD), percent RMS, and relative standard deviation (RSD), is a standardized measure of dispersion of a probability ...
, \sigma/\mu *
Standardized moment In probability theory and statistics, a standardized moment of a probability distribution is a moment (often a higher degree central moment) that is normalized, typically by a power of the standard deviation, rendering the moment scale invariant ...
, \mu_k/\sigma^k * Fano factor, \sigma^2_W/\mu_W (windowed VMR) *
signal-to-noise ratio Signal-to-noise ratio (SNR or S/N) is a measure used in science and engineering that compares the level of a desired signal to the level of background noise. SNR is defined as the ratio of signal power to noise power, often expressed in deci ...
, \mu/\sigma (in
signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as audio signal processing, sound, image processing, images, Scalar potential, potential fields, Seismic tomograph ...
)


Notes


References

* * {{Statistics, descriptive Statistical deviation and dispersion Statistical ratios Statistical randomness Point processes