
In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
.
The method requires maximization of the
geometric mean
In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite collection of positive real numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometri ...
of ''spacings'' in the data, which are the differences between the values of the
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
at neighbouring data points.
The concept underlying the method is based on the
probability integral transform
In probability theory, the probability integral transform (also known as universality of the uniform) relates to the result that data values that are modeled as being random variables from any given continuous distribution can be converted to rando ...
, in that a set of independent random samples derived from any random variable should on average be uniformly distributed with respect to the cumulative distribution function of the random variable. The MPS method chooses the parameter values that make the observed data as uniform as possible, according to a specific quantitative measure of uniformity.
One of the most common methods for estimating the parameters of a distribution from data, the method of
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
(MLE), can break down in various cases, such as involving certain mixtures of continuous distributions.
In these cases the method of maximum spacing estimation may be successful.
Apart from its use in pure mathematics and statistics, the trial applications of the method have been reported using data from fields such as
hydrology
Hydrology () is the scientific study of the movement, distribution, and management of water on Earth and other planets, including the water cycle, water resources, and drainage basin sustainability. A practitioner of hydrology is called a hydro ...
,
econometrics
Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. M. Hashem Pesaran (1987). "Econometrics", '' The New Palgrave: A Dictionary of Economics'', v. 2, p. 8 p. 8 ...
,
magnetic resonance imaging
Magnetic resonance imaging (MRI) is a medical imaging technique used in radiology to generate pictures of the anatomy and the physiological processes inside the body. MRI scanners use strong magnetic fields, magnetic field gradients, and ...
, and others.
History and usage
The MSE method was derived independently by Russel Cheng and Nik Amin at the
University of Wales Institute of Science and Technology, and Bo Ranneby at the
Swedish University of Agricultural Sciences
The Swedish University of Agricultural Sciences, or Swedish Agricultural University (, SLU) is a public research university in Sweden. Although its main campus and head office is located in Ultuna, Uppsala, the university has several campuses ...
.
The authors explained that due to the
probability integral transform
In probability theory, the probability integral transform (also known as universality of the uniform) relates to the result that data values that are modeled as being random variables from any given continuous distribution can be converted to rando ...
at the true parameter, the “spacing” between each observation should be uniformly distributed. This would imply that the difference between the values of the
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
at consecutive observations should be equal. This is the case that maximizes the
geometric mean
In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite collection of positive real numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometri ...
of such spacings, so solving for the parameters that maximize the geometric mean would achieve the “best” fit as defined this way. justified the method by demonstrating that it is an estimator of the
Kullback–Leibler divergence
In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how much a model probability distribution is diff ...
, similar to
maximum likelihood estimation
In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
, but with more robust properties for some classes of problems.
There are certain distributions, especially those with three or more parameters, whose
likelihoods may become infinite along certain paths in the
parameter space The parameter space is the space of all possible parameter values that define a particular mathematical model. It is also sometimes called weight space, and is often a subset of finite-dimensional Euclidean space.
In statistics, parameter spaces a ...
. Using maximum likelihood to estimate these parameters often breaks down, with one parameter tending to the specific value that causes the likelihood to be infinite, rendering the other parameters inconsistent. The method of maximum spacings, however, being dependent on the difference between points on the cumulative distribution function and not individual likelihood points, does not have this issue, and will return valid results over a much wider array of distributions.
The distributions that tend to have likelihood issues are often those used to model physical phenomena. seek to analyze flood alleviation methods, which requires accurate models of river flood effects. The distributions that better model these effects are all three-parameter models, which suffer from the infinite likelihood issue described above, leading to Hall's investigation of the maximum spacing procedure. , when comparing the method to maximum likelihood, use various data sets ranging from a set on the oldest ages at death in Sweden between 1905 and 1958 to a set containing annual maximum wind speeds.
Definition
Given an
iid random sample
In this statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample (termed sample for short) of individuals from within a statistical population to estimate characteristics of the whole ...
of size ''n'' from a
univariate distribution In statistics, a univariate distribution is a probability distribution of only one random variable. This is in contrast to a multivariate distribution, the probability distribution of a random vector (consisting of multiple random variables).
Exam ...
with continuous cumulative distribution function ''F''(''x'';''θ''
0), where ''θ''
0 ∈ Θ is an unknown parameter to be
estimated
Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is de ...
, let be the corresponding
ordered sample, that is the result of sorting of all observations from smallest to largest. For convenience also denote ''x''
(0) = −∞ and ''x''
(''n''+1) = +∞.
Define the ''spacings'' as the “gaps” between the values of the distribution function at adjacent ordered points:
Then the maximum spacing estimator of ''θ''
0 is defined as a value that maximizes the
logarithm
In mathematics, the logarithm of a number is the exponent by which another fixed value, the base, must be raised to produce that number. For example, the logarithm of to base is , because is to the rd power: . More generally, if , the ...
of the
geometric mean
In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite collection of positive real numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometri ...
of sample spacings:
By the
inequality of arithmetic and geometric means
Inequality may refer to:
* Inequality (mathematics), a relation between two quantities when they are different.
* Economic inequality, difference in economic well-being between population groups
** Income inequality, an unequal distribution of in ...
, function ''S''
''n''(''θ'') is bounded from above by −ln(''n''+1), and thus the maximum has to exist at least in the
supremum
In mathematics, the infimum (abbreviated inf; : infima) of a subset S of a partially ordered set P is the greatest element in P that is less than or equal to each element of S, if such an element exists. If the infimum of S exists, it is unique, ...
sense.
Note that some authors define the function ''S''
''n''(''θ'') somewhat differently. In particular, multiplies each ''D''
''i'' by a factor of (''n''+1), whereas omit the factor in front of the sum and add the “−” sign in order to turn the maximization into minimization. As these are constants with respect to ''θ'', the modifications do not alter the location of the maximum of the function ''S''
''n''.
Examples
This section presents two examples of calculating the maximum spacing estimator.
Example 1

Suppose two values ''x''
(1) = 2, ''x''
(2) = 4 were sampled from the
exponential distribution
In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuousl ...
''F''(''x'';''λ'') = 1 − e
−''xλ'', ''x'' ≥ 0 with unknown parameter ''λ'' > 0. In order to construct the MSE we have to first find the spacings:
The process continues by finding the ''λ'' that maximizes the geometric mean of the “difference” column. Using the convention that ignores taking the (''n''+1)st root, this turns into the maximization of the following product: (1 − e
−2''λ'') · (e
−2''λ'' − e
−4''λ'') · (e
−4''λ''). Letting ''μ'' = e
−2''λ'', the problem becomes finding the maximum of ''μ''
5−2''μ''
4+''μ''
3. Differentiating, the ''μ'' has to satisfy 5''μ''
4−8''μ''
3+3''μ''
2 = 0. This equation has roots 0, 0.6, and 1. As ''μ'' is actually e
−2''λ'', it has to be greater than zero but less than one. Therefore, the only acceptable solution is
which corresponds to an exponential distribution with a mean of ≈ 3.915. For comparison, the maximum likelihood estimate of λ is the inverse of the sample mean, 3, so ''λ''
MLE = ⅓ ≈ 0.333.
Example 2
Suppose is the ordered sample from a
uniform distribution ''U''(''a'',''b'') with unknown endpoints ''a'' and ''b''. The cumulative distribution function is ''F''(''x'';''a'',''b'') = (''x''−''a'')/(''b''−''a'') when ''x''∈
'a'',''b'' Therefore, individual spacings are given by
Calculating the geometric mean and then taking the logarithm, statistic ''S''
''n'' will be equal to
Here only three terms depend on the parameters ''a'' and ''b''. Differentiating with respect to those parameters and solving the resulting linear system, the maximum spacing estimates will be
:
These are known to be the
uniformly minimum variance unbiased (UMVU) estimators for the continuous uniform distribution.
In comparison, the maximum likelihood estimates for this problem
and
are biased and have higher
mean-squared error.
Properties
Consistency and efficiency
The maximum spacing estimator is a
consistent estimator
In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter ''θ''0—having the property that as the number of data points used increases indefinitely, the result ...
in that it
converges in probability
In probability theory, there exist several different notions of convergence of sequences of random variables, including ''convergence in probability'', ''convergence in distribution'', and ''almost sure convergence''. The different notions of conve ...
to the true value of the parameter, ''θ''
0, as the sample size increases to infinity.
The consistency of maximum spacing estimation holds under much more general conditions than for
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
estimators. In particular, in cases where the underlying distribution is J-shaped, maximum likelihood will fail where MSE succeeds.
An example of a J-shaped density is the
Weibull distribution
In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum on ...
, specifically a
shifted Weibull, with a
shape parameter
In probability theory and statistics, a shape parameter (also known as form parameter) is a kind of numerical parameter of a parametric family of probability distributionsEveritt B.S. (2002) Cambridge Dictionary of Statistics. 2nd Edition. CUP.
th ...
less than 1. The density will tend to infinity as ''x'' approaches the
location parameter
In statistics, a location parameter of a probability distribution is a scalar- or vector-valued parameter x_0, which determines the "location" or shift of the distribution. In the literature of location parameter estimation, the probability distr ...
rendering estimates of the other parameters inconsistent.
Maximum spacing estimators are also at least as
asymptotically efficient as maximum likelihood estimators, where the latter exist. However, MSEs may exist in cases where MLEs do not.
Sensitivity
Maximum spacing estimators are sensitive to closely spaced observations, and especially ties.
Given
we get
When the ties are due to multiple observations, the repeated spacings (those that would otherwise be zero) should be replaced by the corresponding likelihood.
That is, one should substitute
for
, as
since
.
When ties are due to rounding error, suggest another method to remove the effects.
Given ''r'' tied observations from ''x''
''i'' to ''x''
''i''+''r''−1, let ''δ'' represent the
round-off error
In computing, a roundoff error, also called rounding error, is the difference between the result produced by a given algorithm using exact arithmetic and the result produced by the same algorithm using finite-precision, rounded arithmetic. Roun ...
. All of the true values should then fall in the range
. The corresponding points on the distribution should now fall between
and
. Cheng and Stephens suggest assuming that the rounded values are
uniformly spaced in this interval, by defining
The MSE method is also sensitive to secondary clustering.
One example of this phenomenon is when a set of observations is thought to come from a single
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
, but in fact comes from a
mixture
In chemistry, a mixture is a material made up of two or more different chemical substances which can be separated by physical method. It is an impure substance made up of 2 or more elements or compounds mechanically mixed together in any proporti ...
normals with different means. A second example is when the data is thought to come from an
exponential distribution
In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuousl ...
, but actually comes from a
gamma distribution
In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the g ...
. In the latter case, smaller spacings may occur in the lower tail. A high value of ''M''(''θ'') would indicate this secondary clustering effect, and suggesting a closer look at the data is required.
Moran test
The statistic ''S
n''(''θ'') is also a form of
Moran or Moran-Darling statistic, ''M''(''θ''), which can be used to test
goodness of fit
The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measur ...
.
It has been shown that the statistic, when defined as
is
asymptotically normal, and that a chi-squared approximation exists for small samples.
In the case where we know the true parameter
, show that the statistic
has a
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
with
where ''γ'' is the
Euler–Mascheroni constant
Euler's constant (sometimes called the Euler–Mascheroni constant) is a mathematical constant, usually denoted by the lowercase Greek letter gamma (), defined as the limiting difference between the harmonic series and the natural logarith ...
which is approximately 0.57722.
The distribution can also be approximated by that of
, where
in which
and where
follows a
chi-squared distribution
In probability theory and statistics, the \chi^2-distribution with k Degrees of freedom (statistics), degrees of freedom is the distribution of a sum of the squares of k Independence (probability theory), independent standard normal random vari ...
with
degrees of freedom
In many scientific fields, the degrees of freedom of a system is the number of parameters of the system that may vary independently. For example, a point in the plane has two degrees of freedom for translation: its two coordinates; a non-infinite ...
. Therefore, to test the hypothesis
that a random sample of
values comes from the distribution
, the statistic
can be calculated. Then
should be rejected with
significance if the value is greater than the
critical value Critical value or threshold value can refer to:
* A quantitative threshold in medicine, chemistry and physics
* Critical value (statistics), boundary of the acceptance region while testing a statistical hypothesis
* Value of a function at a crit ...
of the appropriate chi-squared distribution.
Where ''θ''
0 is being estimated by
, showed that
has the same asymptotic mean and variance as in the known case. However, the test statistic to be used requires the addition of a bias correction term and is:
where
is the number of parameters in the estimate.
Generalized maximum spacing
Alternate measures and spacings
generalized the MSE method to approximate other
measures besides the Kullback–Leibler measure. further expanded the method to investigate properties of estimators using higher order spacings, where an ''m''-order spacing would be defined as
.
Multivariate distributions
discuss extended maximum spacing methods to the
multivariate case. As there is no natural order for
, they discuss two alternative approaches: a geometric approach based on
Dirichlet cells and a probabilistic approach based on a “nearest neighbor ball” metric.
See also
*
Kullback–Leibler divergence
In mathematical statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how much a model probability distribution is diff ...
*
Maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...
*
Probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
Notes
References
Citations
Works cited
*
*
''Note: linked paper is an updated 2001 version.''
*
*
*
*
*
*
*
*
*
*
Estimation methods
Probability distribution fitting
{{good article