HOME

TheInfoList



OR:

In
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...
and
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
, the Hellinger distance (closely related to, although different from, the
Bhattacharyya distance In statistics, the Bhattacharyya distance measures the similarity of two probability distributions. It is closely related to the Bhattacharyya coefficient which is a measure of the amount of overlap between two statistical samples or populations. ...
) is used to quantify the similarity between two
probability distributions In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
. It is a type of ''f''-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909. It is sometimes called the Jeffreys distance.


Definition


Measure theory

To define the Hellinger distance in terms of
measure theory In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simila ...
, let P and Q denote two
probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more ge ...
s on a measure space \mathcal that are
absolutely continuous In calculus, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity. The notion of absolute continuity allows one to obtain generalizations of the relationship between the two central ope ...
with respect to an auxiliary measure \lambda. Such a measure always exists, e.g \lambda = (P + Q). The square of the Hellinger distance between P and Q is defined as the quantity :H^2(P,Q) = \frac\displaystyle \int_ \left(\sqrt - \sqrt\right)^2 \lambda(dx). Here, P(dx) = p(x)\lambda(dx) and Q(dx) = q(x) \lambda(dx), i.e. p and q(x) = are the Radon–Nikodym derivatives of ''P'' and ''Q'' respectively with respect to \lambda. This definition does not depend on \lambda, i.e. the Hellinger distance between ''P'' and ''Q'' does not change if \lambda is replaced with a different probability measure with respect to which both ''P'' and ''Q'' are absolutely continuous. For compactness, the above formula is often written as :H^2(P,Q) = \frac\int_ \left(\sqrt - \sqrt\right)^2.


Probability theory using Lebesgue measure

To define the Hellinger distance in terms of elementary probability theory, we take λ to be the
Lebesgue measure In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of ''n''-dimensional Euclidean space. For ''n'' = 1, 2, or 3, it coincides wi ...
, so that ''dP'' / ''dλ'' and ''dQ'' / ''d''λ are simply
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...
s. If we denote the densities as ''f'' and ''g'', respectively, the squared Hellinger distance can be expressed as a standard calculus integral :H^2(f,g) =\frac\int \left(\sqrt - \sqrt\right)^2 \, dx = 1 - \int \sqrt \, dx, where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain equals 1. The Hellinger distance ''H''(''P'', ''Q'') satisfies the property (derivable from the
Cauchy–Schwarz inequality The Cauchy–Schwarz inequality (also called Cauchy–Bunyakovsky–Schwarz inequality) is considered one of the most important and widely used inequalities in mathematics. The inequality for sums was published by . The corresponding inequality f ...
) : 0\le H(P,Q) \le 1.


Discrete distributions

For two discrete probability distributions P=(p_1, \ldots, p_k) and Q=(q_1, \ldots, q_k), their Hellinger distance is defined as : H(P, Q) = \frac \; \sqrt, which is directly related to the
Euclidean norm Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidean ...
of the difference of the square root vectors, i.e. : H(P, Q) = \frac \; \bigl\, \sqrt - \sqrt \bigr\, _2 . Also, 1 - H^2(P,Q) = \sum_^k \sqrt.


Properties

The Hellinger distance forms a bounded
metric Metric or metrical may refer to: * Metric system, an internationally adopted decimal system of measurement * An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement Mathematics In mathe ...
on the
space Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consi ...
of probability distributions over a given
probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
. The maximum distance 1 is achieved when ''P'' assigns probability zero to every set to which ''Q'' assigns a positive probability, and vice versa. Sometimes the factor 1/\sqrt in front of the integral is omitted, in which case the Hellinger distance ranges from zero to the square root of two. The Hellinger distance is related to the Bhattacharyya coefficient BC(P,Q) as it can be defined as : H(P,Q) = \sqrt. Hellinger distances are used in the theory of sequential and asymptotic statistics. The squared Hellinger distance between two
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
s P \sim \mathcal(\mu_1,\sigma_1^2) and Q \sim \mathcal(\mu_2,\sigma_2^2) is: : H^2(P, Q) = 1 - \sqrt \, e^. The squared Hellinger distance between two
multivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One ...
s P \sim \mathcal(\mu_1,\Sigma_1) and Q \sim \mathcal(\mu_2,\Sigma_2) is : H^2(P, Q) = 1 - \frac \exp\left\ The squared Hellinger distance between two
exponential distribution In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average ...
s P \sim \mathrm(\alpha) and Q \sim \mathrm(\beta) is: : H^2(P, Q) = 1 - \frac. The squared Hellinger distance between two
Weibull distribution In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Maurice Re ...
s P \sim \mathrm(k,\alpha) and Q \sim \mathrm(k,\beta) (where k is a common shape parameter and \alpha\, , \beta are the scale parameters respectively): : H^2(P, Q) = 1 - \frac. The squared Hellinger distance between two
Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...
s with rate parameters \alpha and \beta, so that P \sim \mathrm(\alpha) and Q \sim \mathrm(\beta), is: : H^2(P,Q) = 1-e^. The squared Hellinger distance between two
beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...
s P \sim \text(a_1,b_1) and Q \sim \text(a_2, b_2) is: : H^2(P,Q) = 1 - \frac where B is the
beta function In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral : \Beta(z_1,z_2) = \int_0^1 t^( ...
. The squared Hellinger distance between two
gamma distribution In probability theory and statistics, the gamma distribution is a two- parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma di ...
s P \sim \text(a_1,b_1) and Q \sim \text(a_2, b_2) is: : H^2(P,Q) = 1 - \Gamma\left(\right)\left(\frac\right)^\sqrt where \Gamma is the
gamma function In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers excep ...
.


Connection with total variation distance

The Hellinger distance H(P,Q) and the
total variation distance In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational dist ...
(or statistical distance) \delta(P,Q) are related as follows: : H^2(P,Q) \leq \delta(P,Q) \leq \sqrtH(P,Q)\,. The constants in this inequality may change depending on which renormalization you choose (1/2 or 1/\sqrt). These inequalities follow immediately from the inequalities between the 1-norm and the
2-norm In mathematics, a norm is a function from a real or complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the origin: it commutes with scaling, obeys a form of the triangle inequality, and is z ...
.


See also

* Statistical distance *
Kullback–Leibler divergence In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different fr ...
*
Bhattacharyya distance In statistics, the Bhattacharyya distance measures the similarity of two probability distributions. It is closely related to the Bhattacharyya coefficient which is a measure of the amount of overlap between two statistical samples or populations. ...
*
Total variation distance In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational dist ...
*
Fisher information metric In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, ''i.e.'', a smooth manifold whose points are probability measures defined on a common probability spa ...


Notes


References

* * * {{cite book , author=Pollard, David E. , title=A user's guide to measure theoretic probability , publisher=Cambridge University Press , location=Cambridge, UK , year=2002 , isbn=0-521-00289-3 Theory of probability distributions F-divergences Statistical distance