In
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
, an
-divergence is a function
that measures the difference between two
probability distributions
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
and
. Many common divergences, such as
KL-divergence,
Hellinger distance, and
total variation distance In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance, statistical difference or variational dist ...
, are special cases of
-divergence.
History
These divergences were introduced by
Alfréd Rényi
Alfréd Rényi (20 March 1921 – 1 February 1970) was a Hungarian mathematician known for his work in probability theory, though he also made contributions in combinatorics, graph theory, and number theory.
Life
Rényi was born in Budapest to A ...
in the same paper where he introduced the well-known
Rényi entropy In information theory, the Rényi entropy is a quantity that generalizes various notions of entropy, including Hartley entropy, Shannon entropy, collision entropy, and min-entropy. The Rényi entropy is named after Alfréd Rényi, who looked for t ...
. He proved that these divergences decrease in
Markov process
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...
es. ''f''-divergences were studied further independently by , and and are sometimes known as Csiszár
-divergences, Csiszár-Morimoto divergences, or Ali-Silvey distances.
Definition
Non-singular case
Let
and
be two probability distributions over a space
, such that
, that is,
is
absolutely continuous with respect to
. Then, for a
convex function
In mathematics, a real-valued function is called convex if the line segment between any two points on the graph of the function lies above the graph between the two points. Equivalently, a function is convex if its epigraph (the set of poi ...