In
probability theory
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
, an
-divergence is a certain type of function
that measures the difference between two
probability distributions
In probability theory and statistics, a probability distribution is a function that gives the probabilities of occurrence of possible events for an experiment. It is a mathematical description of a random phenomenon in terms of its sample spac ...
and
. Many common divergences, such as
KL-divergence,
Hellinger distance
In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of ''f''-divergence. The Hell ...
, and
total variation distance
In probability theory, the total variation distance is a statistical distance between probability distributions, and is sometimes called the statistical distance, statistical difference or variational distance.
Definition
Consider a measurable ...
, are special cases of
-divergence.
History
These divergences were introduced by
Alfréd Rényi
Alfréd Rényi (20 March 1921 – 1 February 1970) was a Hungarian mathematician known for his work in probability theory, though he also made contributions in combinatorics, graph theory, and number theory.
Life
Rényi was born in Budapest to A ...
in the same paper where he introduced the well-known
Rényi entropy
In information theory, the Rényi entropy is a quantity that generalizes various notions of Entropy (information theory), entropy, including Hartley entropy, Shannon entropy, collision entropy, and min-entropy. The Rényi entropy is named after Alf ...
. He proved that these divergences decrease in
Markov process
In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, ...
es. ''f''-divergences were studied further independently by , and and are sometimes known as Csiszár
-divergences, Csiszár–Morimoto divergences, or Ali–Silvey distances.
Definition
Non-singular case
Let
and
be two probability distributions over a space
, such that
, that is,
is
absolutely continuous
In calculus and real analysis, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity. The notion of absolute continuity allows one to obtain generalizations of the relationship betwe ...
with respect to
(meaning
wherever
). Then, for a
convex function
In mathematics, a real-valued function is called convex if the line segment between any two distinct points on the graph of a function, graph of the function lies above or on the graph between the two points. Equivalently, a function is conve ...