Energy distance is a
statistical distance
In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distributions or samples, or the distance can be bet ...
between
probability distributions
In probability theory and statistics, a probability distribution is a function that gives the probabilities of occurrence of possible events for an experiment. It is a mathematical description of a random phenomenon in terms of its sample spac ...
. If X and Y are independent random vectors in ''R''
d with
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ever ...
s (cdf) F and G respectively, then the energy distance between the distributions F and G is defined to be the square root of
:
where (X, X', Y, Y') are independent, the cdf of X and X' is F, the cdf of Y and Y' is G,
is the
expected value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
, and , , . , , denotes the
length
Length is a measure of distance. In the International System of Quantities, length is a quantity with Dimension (physical quantity), dimension distance. In most systems of measurement a Base unit (measurement), base unit for length is chosen, ...
of a vector. Energy distance satisfies all axioms of a metric thus energy distance characterizes the equality of distributions: D(F,G) = 0 if and only if F = G.
Energy distance for statistical applications was introduced in 1985 by
Gábor J. Székely, who proved that for real-valued random variables
is exactly twice
Harald Cramér
Harald Cramér (; 25 September 1893 – 5 October 1985) was a Swedish mathematician, actuary, and statistician, specializing in mathematical statistics and probabilistic number theory. John Kingman described him as "one of the giants of statis ...
's distance:
:
For a simple proof of this equivalence, see Székely (2002).
In higher dimensions, however, the two distances are different because the energy distance is rotation invariant while Cramér's distance is not. (Notice that Cramér's distance is not the same as the
distribution-free Cramér–von Mises criterion.)
Generalization to metric spaces
One can generalize the notion of energy distance to probability distributions on metric spaces. Let
be a
metric space
In mathematics, a metric space is a Set (mathematics), set together with a notion of ''distance'' between its Element (mathematics), elements, usually called point (geometry), points. The distance is measured by a function (mathematics), functi ...
with its
Borel sigma algebra
In mathematics, a Borel set is any subset of a topological space that can be formed from its open sets (or, equivalently, from closed sets) through the operations of countable union, countable intersection, and relative complement. Borel sets are ...
. Let
denote the collection of all
probability measure
In mathematics, a probability measure is a real-valued function defined on a set of events in a σ-algebra that satisfies Measure (mathematics), measure properties such as ''countable additivity''. The difference between a probability measure an ...
s on the
measurable space
In mathematics, a measurable space or Borel space is a basic object in measure theory. It consists of a set and a σ-algebra, which defines the subsets that will be measured.
It captures and generalises intuitive notions such as length, area, an ...
. If μ and ν are probability measures in
, then the energy-distance
of μ and ν can be defined as the square root of
:
This is not necessarily non-negative, however. If
is a strongly negative definite kernel, then
is a
metric
Metric or metrical may refer to:
Measuring
* Metric system, an internationally adopted decimal system of measurement
* An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement
Mathematics
...
, and conversely.
[Klebanov, L. B. (2005) N-distances and their Applications, ]Karolinum Press
Karolinum Press is the university press of Charles University in Prague. It was established in 1990, and it has published over 5000 titles since then. Its English-language books are distributed globally by University of Chicago Press, and its e-b ...
,
Charles University, Prague. This condition is expressed by saying that
has negative type. Negative type is not sufficient for
to be a metric; the latter condition is expressed by saying that
has strong negative type. In this situation, the energy distance is zero if and only if X and Y are identically distributed. An example of a metric of negative type but not of strong negative type is the plane with the
taxicab metric. All Euclidean spaces and even separable Hilbert spaces have strong negative type.
In the literature on
kernel methods
In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pa ...
for
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
, these generalized notions of energy distance are studied under the name of maximum mean discrepancy. Equivalence of distance based and kernel methods for hypothesis testing is covered by several authors.
Energy statistics
A related statistical concept, the notion of E-statistic or energy-statistic was introduced by
Gábor J. Székely in the 1980s when he was giving colloquium lectures in Budapest, Hungary and at MIT, Yale, and Columbia. This concept is based on the notion of Newton’s
potential energy
In physics, potential energy is the energy of an object or system due to the body's position relative to other objects, or the configuration of its particles. The energy is equal to the work done against any restoring forces, such as gravity ...
.
[Székely, G.J. (2002) E-statistics: The Energy of Statistical Samples, Technical Report BGSU No 02-16.] The idea is to consider statistical observations as
heavenly bodies governed by a statistical
potential energy
In physics, potential energy is the energy of an object or system due to the body's position relative to other objects, or the configuration of its particles. The energy is equal to the work done against any restoring forces, such as gravity ...
which is zero only when an underlying statistical
null hypothesis
The null hypothesis (often denoted ''H''0) is the claim in scientific research that the effect being studied does not exist. The null hypothesis can also be described as the hypothesis in which no relationship exists between two sets of data o ...
is true. Energy statistics are functions of
distances
Distance is a numerical or occasionally qualitative measurement of how far apart objects, points, people, or ideas are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two co ...
between statistical observations.
Energy distance and E-statistic were considered as N-distances and N-statistic in Zinger A.A., Kakosyan A.V., Klebanov L.B. Characterization of distributions by means of mean values of some statistics in connection with some probability metrics, Stability Problems for Stochastic Models. Moscow, VNIISI, 1989,47-55. (in Russian), English Translation: A characterization of distributions by mean values of statistics and certain probabilistic metrics A. A. Zinger, A. V. Kakosyan, L. B. Klebanov in Journal of Soviet Mathematics (1992). In the same paper there was given a definition of strongly negative definite kernel, and provided a generalization on metric spaces, discussed above. The book
[ gives these results and their applications to statistical testing as well. The book contains also some applications to recovering the measure from its potential.
]
Testing for equal distributions
Consider the null hypothesis that two random variables, ''X'' and ''Y'', have the same probability distributions: . For statistical sample
In this statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample (termed sample for short) of individuals from within a statistical population to estimate characteristics of the whole ...
s from ''X'' and ''Y'':
: and ,
the following arithmetic averages of distances are computed between the X and the Y samples:
:.
The E-statistic of the underlying null hypothesis is defined as follows:
:
One can prove[ that and that the corresponding population value is zero if and only if ''X'' and ''Y'' have the same distribution (). Under this null hypothesis the test statistic
:
]converges in distribution
In probability theory, there exist several different notions of convergence of sequences of random variables, including ''convergence in probability'', ''convergence in distribution'', and ''almost sure convergence''. The different notions of conve ...
to a quadratic form of independent standard normal random variable
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac e^\ ...
s. Under the alternative hypothesis ''T'' tends to infinity. This makes it possible to construct a consistent statistical test
A statistical hypothesis test is a method of statistical inference used to decide whether the data provide sufficient evidence to reject a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. ...
, the energy test for equal distributions.
The E-coefficient of inhomogeneity can also be introduced. This is always between 0 and 1 and is defined as
:
where denotes the expected value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
. ''H'' = 0 exactly when ''X'' and ''Y'' have the same distribution.
Goodness-of-fit
A multivariate goodness-of-fit measure is defined for distributions in arbitrary dimension (not restricted by sample size). The energy goodness-of-fit statistic is
:
where X and X' are independent and identically distributed according to the hypothesized distribution, and . The only required condition is that X has finite moment under the null hypothesis. Under the null hypothesis , and the asymptotic distribution of Qn is a quadratic form of centered Gaussian random variables. Under an alternative hypothesis, Qn tends to infinity stochastically, and thus determines a statistically consistent test. For most applications the exponent 1 (Euclidean distance) can be applied. The important special case of testing multivariate normality[Reprint]
is implemented in the ''energy'' package for R. Tests are also developed for heavy tailed distributions such as Pareto (power law
In statistics, a power law is a Function (mathematics), functional relationship between two quantities, where a Relative change and difference, relative change in one quantity results in a relative change in the other quantity proportional to the ...
), or stable distribution
In probability theory, a distribution is said to be stable if a linear combination of two independent random variables with this distribution has the same distribution, up to location and scale parameters. A random variable is said to be st ...
s by application of exponents in (0,1).
Applications
Applications include:
* Hierarchical clustering
In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two ...
(a generalization of Ward's method)
* Testing multivariate normality[
* Testing the multi-sample hypothesis of equal distributions,
* Change point detection
* Multivariate independence:
** ]distance correlation
In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is ze ...
,
** Brownian covariance.
* Scoring rule
In decision theory, a scoring rule
provides evaluation metrics for probabilistic forecasting, probabilistic predictions or forecasts. While "regular" loss functions (such as mean squared error) assign a goodness-of-fit score to a predicted value ...
s:
:Gneiting and Raftery apply energy distance to develop a new and very general type of proper scoring rule for probabilistic predictions, the energy score.
* Robust statistics
* Scenario reduction
* Gene selection
* Microarray data analysis
* Material structure analysis
* Morphometric and chemometric data
Applications of energy statistics are implemented in the open source ''energy'' package for R.
References
{{DEFAULTSORT:E-Statistic
Statistical distance
Statistical hypothesis testing
Theory of probability distributions