probability Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...

and

statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of ''f''-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909. It is sometimes called the Jeffreys distance.

Definition

Measure theory

To define the Hellinger distance in terms of

measure theory In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as magnitude (mathematics), magnitude, mass, and probability of events. These seemingl ...

, let

P

and

Q

denote two

probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a σ-algebra that satisfies Measure (mathematics), measure properties such as ''countable additivity''. The difference between a probability measure an ...

s on a measure space

\mathcal

that are absolutely continuous with respect to an auxiliary measure

\lambda

. Such a measure always exists, e.g

\lambda = (P + Q)

. The square of the Hellinger distance between

P

and

Q

is defined as the quantity :

H^2(P,Q) = \frac\displaystyle \int_ \left(\sqrt - \sqrt\right)^2 \lambda(dx).

Here,

P(dx) = p(x)\lambda(dx)

and

Q(dx) = q(x) \lambda(dx)

, i.e.

p

and

q

are the Radon–Nikodym derivatives of ''P'' and ''Q'' respectively with respect to

\lambda

. This definition does not depend on

\lambda

, i.e. the Hellinger distance between ''P'' and ''Q'' does not change if

\lambda

is replaced with a different probability measure with respect to which both ''P'' and ''Q'' are absolutely continuous. For compactness, the above formula is often written as :

H^2(P,Q) = \frac\int_ \left(\sqrt - \sqrt\right)^2.

Probability theory using Lebesgue measure

To define the Hellinger distance in terms of elementary probability theory, we take λ to be the

Lebesgue measure In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of higher dimensional Euclidean '-spaces. For lower dimensions or , it c ...

, so that ''dP'' / ''dλ'' and ''dQ'' / ''d''λ are simply

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

s. If we denote the densities as ''f'' and ''g'', respectively, the squared Hellinger distance can be expressed as a standard calculus integral :

H^2(f,g) =\frac\int \left(\sqrt - \sqrt\right)^2 \, dx = 1 - \int \sqrt \, dx,

where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain equals 1. The Hellinger distance ''H''(''P'', ''Q'') satisfies the property (derivable from the

Cauchy–Schwarz inequality The Cauchy–Schwarz inequality (also called Cauchy–Bunyakovsky–Schwarz inequality) is an upper bound on the absolute value of the inner product between two vectors in an inner product space in terms of the product of the vector norms. It is ...

) :

0\le H(P,Q) \le 1.

Discrete distributions

For two discrete probability distributions

P=(p_1, \ldots, p_k)

and

Q=(q_1, \ldots, q_k)

, their Hellinger distance is defined as :

H(P, Q) = \frac \; \sqrt,

which is directly related to the Euclidean norm of the difference of the square root vectors, i.e. :

H(P, Q) = \frac \; \bigl\, \sqrt - \sqrt \bigr\, _2 .

Also,

1 - H^2(P,Q) = \sum_^k \sqrt.

Properties

The Hellinger distance forms a bounded metric on the

space Space is a three-dimensional continuum containing positions and directions. In classical physics, physical space is often conceived in three linear dimensions. Modern physicists usually consider it, with time, to be part of a boundless ...

of probability distributions over a given

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models ...

. The maximum distance 1 is achieved when ''P'' assigns probability zero to every set to which ''Q'' assigns a positive probability, and vice versa. Sometimes the factor

1/\sqrt

in front of the integral is omitted, in which case the Hellinger distance ranges from zero to the square root of two. The Hellinger distance is related to the Bhattacharyya coefficient

BC(P,Q)

as it can be defined as :

H(P,Q) = \sqrt.

Hellinger distances are used in the theory of sequential and asymptotic statistics. The squared Hellinger distance between two

normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...

P \sim \mathcal(\mu_1,\sigma_1^2)

and

Q \sim \mathcal(\mu_2,\sigma_2^2)

is: :

H^2(P, Q) = 1 - \sqrt \,  e^.

The squared Hellinger distance between two multivariate normal distributions

P \sim \mathcal(\mu_1,\Sigma_1)

and

Q \sim \mathcal(\mu_2,\Sigma_2)

is :

H^2(P, Q) = 1 - \frac 
              \exp\left\

The squared Hellinger distance between two

exponential distribution In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuousl ...

P \sim \mathrm(\alpha)

and

Q \sim \mathrm(\beta)

is: :

H^2(P, Q) = 1 - \frac.

The squared Hellinger distance between two Weibull distributions

P \sim \mathrm(k,\alpha)

and

Q \sim \mathrm(k,\beta)

(where

k

is a common shape parameter and

\alpha\, , \beta

are the scale parameters respectively): :

H^2(P, Q) = 1 - \frac.

The squared Hellinger distance between two

Poisson distribution In probability theory and statistics, the Poisson distribution () is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known const ...

s with rate parameters

\alpha

and

\beta

, so that

P \sim \mathrm(\alpha)

and

Q \sim \mathrm(\beta)

, is: :

H^2(P,Q) = 1-e^.

The squared Hellinger distance between two beta distributions

P \sim \text(a_1,b_1)

and

Q \sim \text(a_2, b_2)

is: :

H^2(P,Q) = 1 - \frac

where

B

is the beta function. The squared Hellinger distance between two gamma distributions

P \sim \text(a_1,b_1)

and

Q \sim \text(a_2, b_2)

is: :

H^2(P,Q) = 1 - \Gamma\left(\right)\left(\frac\right)^\sqrt

where

\Gamma

is the

gamma function In mathematics, the gamma function (represented by Γ, capital Greek alphabet, Greek letter gamma) is the most common extension of the factorial function to complex numbers. Derived by Daniel Bernoulli, the gamma function \Gamma(z) is defined ...

Connection with total variation distance

The Hellinger distance

H(P,Q)

and the

total variation distance In probability theory, the total variation distance is a statistical distance between probability distributions, and is sometimes called the statistical distance, statistical difference or variational distance. Definition Consider a measurable ...

(or statistical distance)

\delta(P,Q)

are related as follows: :

H^2(P,Q) \leq \delta(P,Q) \leq  \sqrtH(P,Q)\,.

The constants in this inequality may change depending on which renormalization you choose (

1/2

1/\sqrt

). These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.

Notes

References

* * * {{cite book , author=Pollard, David E. , title=A user's guide to measure theoretic probability , publisher=Cambridge University Press , location=Cambridge, UK , year=2002 , isbn=0-521-00289-3 Theory of probability distributions F-divergences Statistical distance

Definition

Measure theory

Probability theory using Lebesgue measure

Discrete distributions

Properties

Connection with total variation distance

See also

Notes

References