probability Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...

and

statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...

, the Hellinger distance (closely related to, although different from, the

Bhattacharyya distance In statistics, the Bhattacharyya distance measures the similarity of two probability distributions. It is closely related to the Bhattacharyya coefficient which is a measure of the amount of overlap between two statistical samples or populations. ...

) is used to quantify the similarity between two

probability distributions In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...

. It is a type of ''f''-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909. It is sometimes called the Jeffreys distance.

Definition

Measure theory

To define the Hellinger distance in terms of

measure theory In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simila ...

, let

P

and

Q

denote two

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more ge ...

s on a measure space

\mathcal

that are

absolutely continuous In calculus, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity. The notion of absolute continuity allows one to obtain generalizations of the relationship between the two central ope ...

with respect to an auxiliary measure

\lambda

. Such a measure always exists, e.g

\lambda = (P + Q)

. The square of the Hellinger distance between

P

and

Q

is defined as the quantity :

H^2(P,Q) = \frac\displaystyle \int_ \left(\sqrt - \sqrt\right)^2 \lambda(dx).

Here,

P(dx) = p(x)\lambda(dx)

and

Q(dx) = q(x) \lambda(dx)

, i.e.

p

and

q(x) =

are the Radon–Nikodym derivatives of ''P'' and ''Q'' respectively with respect to

\lambda

. This definition does not depend on

\lambda

, i.e. the Hellinger distance between ''P'' and ''Q'' does not change if

\lambda

is replaced with a different probability measure with respect to which both ''P'' and ''Q'' are absolutely continuous. For compactness, the above formula is often written as :

H^2(P,Q) = \frac\int_ \left(\sqrt - \sqrt\right)^2.

Probability theory using Lebesgue measure

To define the Hellinger distance in terms of elementary probability theory, we take λ to be the

Lebesgue measure In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of ''n''-dimensional Euclidean space. For ''n'' = 1, 2, or 3, it coincides wi ...

, so that ''dP'' / ''dλ'' and ''dQ'' / ''d''λ are simply

probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...

s. If we denote the densities as ''f'' and ''g'', respectively, the squared Hellinger distance can be expressed as a standard calculus integral :

H^2(f,g) =\frac\int \left(\sqrt - \sqrt\right)^2 \, dx = 1 - \int \sqrt \, dx,

where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain equals 1. The Hellinger distance ''H''(''P'', ''Q'') satisfies the property (derivable from the

Cauchy–Schwarz inequality The Cauchy–Schwarz inequality (also called Cauchy–Bunyakovsky–Schwarz inequality) is considered one of the most important and widely used inequalities in mathematics. The inequality for sums was published by . The corresponding inequality f ...

) :

0\le H(P,Q) \le 1.

Discrete distributions

For two discrete probability distributions

P=(p_1, \ldots, p_k)

and

Q=(q_1, \ldots, q_k)

, their Hellinger distance is defined as :

H(P, Q) = \frac \; \sqrt,

which is directly related to the

Euclidean norm Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidean ...

of the difference of the square root vectors, i.e. :

H(P, Q) = \frac \; \bigl\, \sqrt - \sqrt \bigr\, _2 .

Also,

1 - H^2(P,Q) = \sum_^k \sqrt.

Properties

The Hellinger distance forms a bounded

metric Metric or metrical may refer to: * Metric system, an internationally adopted decimal system of measurement * An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement Mathematics In mathe ...

on the

space Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consi ...

of probability distributions over a given

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...

. The maximum distance 1 is achieved when ''P'' assigns probability zero to every set to which ''Q'' assigns a positive probability, and vice versa. Sometimes the factor

1/\sqrt

in front of the integral is omitted, in which case the Hellinger distance ranges from zero to the square root of two. The Hellinger distance is related to the Bhattacharyya coefficient

BC(P,Q)

as it can be defined as :

H(P,Q) = \sqrt.

Hellinger distances are used in the theory of sequential and asymptotic statistics. The squared Hellinger distance between two

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...

P \sim \mathcal(\mu_1,\sigma_1^2)

and

Q \sim \mathcal(\mu_2,\sigma_2^2)

is: :

H^2(P, Q) = 1 - \sqrt \,  e^.

The squared Hellinger distance between two

multivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One ...

P \sim \mathcal(\mu_1,\Sigma_1)

and

Q \sim \mathcal(\mu_2,\Sigma_2)

is :

H^2(P, Q) = 1 - \frac 
              \exp\left\

The squared Hellinger distance between two

exponential distribution In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average ...

P \sim \mathrm(\alpha)

and

Q \sim \mathrm(\beta)

is: :

H^2(P, Q) = 1 - \frac.

The squared Hellinger distance between two

Weibull distribution In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Maurice Re ...

P \sim \mathrm(k,\alpha)

and

Q \sim \mathrm(k,\beta)

(where

k

is a common shape parameter and

\alpha\, , \beta

are the scale parameters respectively): :

H^2(P, Q) = 1 - \frac.

The squared Hellinger distance between two

Poisson distribution In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known co ...

s with rate parameters

\alpha

and

\beta

, so that

P \sim \mathrm(\alpha)

and

Q \sim \mathrm(\beta)

, is: :

H^2(P,Q) = 1-e^.

The squared Hellinger distance between two

beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...

P \sim \text(a_1,b_1)

and

Q \sim \text(a_2, b_2)

is: :

H^2(P,Q) = 1 - \frac

where

B

is the

beta function In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral : \Beta(z_1,z_2) = \int_0^1 t^( ...

. The squared Hellinger distance between two

gamma distribution In probability theory and statistics, the gamma distribution is a two- parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma di ...

P \sim \text(a_1,b_1)

and

Q \sim \text(a_2, b_2)

is: :

H^2(P,Q) = 1 - \Gamma\left(\right)\left(\frac\right)^\sqrt

where

\Gamma

is the

gamma function In mathematics, the gamma function (represented by , the capital letter gamma from the Greek alphabet) is one commonly used extension of the factorial function to complex numbers. The gamma function is defined for all complex numbers excep ...

Connection with total variation distance

The Hellinger distance

H(P,Q)

and the

total variation distance In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational dist ...

(or statistical distance)

\delta(P,Q)

are related as follows: :

H^2(P,Q) \leq \delta(P,Q) \leq  \sqrtH(P,Q)\,.

The constants in this inequality may change depending on which renormalization you choose (

1/2

1/\sqrt

). These inequalities follow immediately from the inequalities between the 1-norm and the

2-norm In mathematics, a norm is a function from a real or complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the origin: it commutes with scaling, obeys a form of the triangle inequality, and is z ...

Notes

References

* * * {{cite book , author=Pollard, David E. , title=A user's guide to measure theoretic probability , publisher=Cambridge University Press , location=Cambridge, UK , year=2002 , isbn=0-521-00289-3 Theory of probability distributions F-divergences Statistical distance

Definition

Measure theory

Probability theory using Lebesgue measure

Discrete distributions

Properties

Connection with total variation distance

See also

Notes

References