In
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...
and
statistics
Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, the Hellinger distance (closely related to, although different from, the
Bhattacharyya distance In statistics, the Bhattacharyya distance measures the similarity of two probability distributions. It is closely related to the Bhattacharyya coefficient which is a measure of the amount of overlap between two statistical samples or populations.
...
) is used to quantify the similarity between two
probability distributions
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
. It is a type of
''f''-divergence. The Hellinger distance is defined in terms of the
Hellinger integral, which was introduced by
Ernst Hellinger in 1909.
It is sometimes called the Jeffreys distance.
Definition
Measure theory
To define the Hellinger distance in terms of
measure theory
In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simila ...
, let
and
denote two
probability measure
In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more ge ...
s on a measure space
that are
absolutely continuous
In calculus, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity. The notion of absolute continuity allows one to obtain generalizations of the relationship between the two central ope ...
with respect to an auxiliary measure
. Such a measure always exists, e.g
. The square of the Hellinger distance between
and
is defined as the quantity
:
Here,
and
, i.e.
and
are the
Radon–Nikodym derivatives of ''P'' and ''Q'' respectively with respect to
. This definition does not depend on
, i.e. the Hellinger distance between ''P'' and ''Q'' does not change if
is replaced with a different probability measure with respect to which both ''P'' and ''Q'' are absolutely continuous. For compactness, the above formula is often written as
:
Probability theory using Lebesgue measure
To define the Hellinger distance in terms of elementary probability theory, we take λ to be the
Lebesgue measure
In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of ''n''-dimensional Euclidean space. For ''n'' = 1, 2, or 3, it coincides wi ...
, so that ''dP'' / ''dλ'' and ''dQ'' / ''d''λ are simply
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...
s. If we denote the densities as ''f'' and ''g'', respectively, the squared Hellinger distance can be expressed as a standard calculus integral
:
where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain equals 1.
The Hellinger distance ''H''(''P'', ''Q'') satisfies the property (derivable from the
Cauchy–Schwarz inequality
The Cauchy–Schwarz inequality (also called Cauchy–Bunyakovsky–Schwarz inequality) is considered one of the most important and widely used inequalities in mathematics.
The inequality for sums was published by . The corresponding inequality f ...
)
:
Discrete distributions
For two discrete probability distributions
and
,
their Hellinger distance is defined as
:
which is directly related to the
Euclidean norm
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidean ...
of the difference of the square root vectors, i.e.
:
Also,
Properties
The Hellinger distance forms a
bounded metric
Metric or metrical may refer to:
* Metric system, an internationally adopted decimal system of measurement
* An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement
Mathematics
In mathe ...
on the
space
Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consi ...
of probability distributions over a given
probability space
In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
.
The maximum distance 1 is achieved when ''P'' assigns probability zero to every set to which ''Q'' assigns a positive probability, and vice versa.
Sometimes the factor
in front of the integral is omitted, in which case the Hellinger distance ranges from zero to the square root of two.
The Hellinger distance is related to the
Bhattacharyya coefficient as it can be defined as
:
Hellinger distances are used in the theory of
sequential and
asymptotic statistics.
The squared Hellinger distance between two
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
s
and
is:
:
The squared Hellinger distance between two
multivariate normal distribution
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One ...
s
and
is
:
The squared Hellinger distance between two
exponential distribution
In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average ...
s
and
is:
:
The squared Hellinger distance between two
Weibull distribution
In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Maurice Re ...
s
and
(where
is a common shape parameter and
are the scale parameters respectively):
:
The squared Hellinger distance between two
Poisson distribution
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
s with rate parameters
and
, so that
and
, is:
:
The squared Hellinger distance between two
beta distribution
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...
s
and
is:
:
where
is the
beta function
In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral
: \Beta(z_1,z_2) = \int_0^1 t^( ...
.
The squared Hellinger distance between two
gamma distribution
In probability theory and statistics, the gamma distribution is a two- parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma di ...
s
and
is:
:
where
is the
gamma function
In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers excep ...
.
Connection with total variation distance
The Hellinger distance
and the
total variation distance In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational dist ...
(or statistical distance)
are related as follows:
:
The constants in this inequality may change depending on which renormalization you choose (
or
).
These inequalities follow immediately from the inequalities between the
1-norm and the
2-norm
In mathematics, a norm is a function from a real or complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the origin: it commutes with scaling, obeys a form of the triangle inequality, and is z ...
.
See also
*
Statistical distance
*
Kullback–Leibler divergence
In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fr ...
*
Bhattacharyya distance In statistics, the Bhattacharyya distance measures the similarity of two probability distributions. It is closely related to the Bhattacharyya coefficient which is a measure of the amount of overlap between two statistical samples or populations.
...
*
Total variation distance In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational dist ...
*
Fisher information metric In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, ''i.e.'', a smooth manifold whose points are probability measures defined on a common probability spa ...
Notes
References
*
*
* {{cite book , author=Pollard, David E. , title=A user's guide to measure theoretic probability , publisher=Cambridge University Press , location=Cambridge, UK , year=2002 , isbn=0-521-00289-3
Theory of probability distributions
F-divergences
Statistical distance