HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
, statistics, and
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
, the continuous Bernoulli distribution is a family of continuous
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
s parameterized by a single
shape parameter In probability theory and statistics, a shape parameter (also known as form parameter) is a kind of numerical parameter of a parametric family of probability distributionsEveritt B.S. (2002) Cambridge Dictionary of Statistics. 2nd Edition. CUP. t ...
\lambda \in (0, 1), defined on the unit interval x \in
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
/math>, by: : p(x , \lambda) \propto \lambda^x (1-\lambda)^. The continuous Bernoulli distribution arises in deep learning and
computer vision Computer vision is an Interdisciplinarity, interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate t ...
, specifically in the context of
variational autoencoder In machine learning, a variational autoencoder (VAE), is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling, belonging to the families of probabilistic graphical models and variational Bayesian methods. ...
s, for modeling the pixel intensities of natural images. As such, it defines a proper probabilistic counterpart for the commonly used binary
cross entropy In information theory, the cross-entropy between two probability distributions p and q over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is ...
loss, which is often applied to continuous, ,1/math>-valued data. This practice amounts to ignoring the normalizing constant of the continuous Bernoulli distribution, since the binary cross entropy loss only defines a true log-likelihood for discrete, \-valued data. The continuous Bernoulli also defines an
exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
of distributions. Writing \eta = \log\left(\lambda/(1-\lambda)\right) for the
natural parameter In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
, the density can be rewritten in canonical form: p(x , \eta) \propto \exp (\eta x) .


Related distributions


Bernoulli distribution

The continuous Bernoulli can be thought of as a continuous relaxation of the
Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probab ...
, which is defined on the discrete set \ by the
probability mass function In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...
: : p(x) = p^x (1-p)^, where p is a scalar parameter between 0 and 1. Applying this same functional form on the continuous interval ,1 results in the continuous Bernoulli
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...
, up to a normalizing constant.


Beta distribution

The
Beta distribution In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval , 1in terms of two positive parameters, denoted by ''alpha'' (''α'') and ''beta'' (''β''), that appear as ...
has the density function: : p(x) \propto x^ (1-x)^, which can be re-written as: : p(x) \propto x_1^ x_2^, where \alpha_1, \alpha_2 are positive scalar parameters, and (x_1, x_2) represents an arbitrary point inside the 1-
simplex In geometry, a simplex (plural: simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex is so-named because it represents the simplest possible polytope in any given dimension ...
, \Delta^ = \ . Switching the role of the parameter and the argument in this density function, we obtain: : p(x) \propto \alpha_1^ \alpha_2^. This family is only
identifiable In statistics, identifiability is a property which a model must satisfy for precise inference to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model's underlying parameters after obtaining a ...
up to the linear constraint \alpha_1 + \alpha_2 = 1 , whence we obtain: : p(x) \propto \lambda^ (1-\lambda)^, corresponding exactly to the continuous Bernoulli density.


Exponential distribution

An
exponential distribution In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant averag ...
restricted to the unit interval is equivalent to a continuous Bernoulli distribution with appropriate parameter.


Continuous categorical distribution

The multivariate generalization of the continuous Bernoulli is called the continuous-categorical.Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. In 36th International Conference on Machine Learning, ICML 2020. International Machine Learning Society (IMLS).


References

{{Probability distributions Continuous distributions Exponential family distributions