Variational message passing (VMP) is an
approximate inference Approximate inference methods make it possible to learn realistic models from big data by trading off computation time for accuracy, when exact learning and inference are computationally intractable.
Major methods classes
*Laplace's approximation ...
technique for continuous- or discrete-valued
Bayesian networks
A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...
, with
conjugate-exponential parents, developed by John Winn. VMP was developed as a means of generalizing the approximate
variational methods
The calculus of variations (or Variational Calculus) is a field of mathematical analysis that uses variations, which are small changes in functions
and functionals, to find maxima and minima of functionals: mappings from a set of functions t ...
used by such techniques as
latent Dirichlet allocation
In natural language processing, Latent Dirichlet Allocation (LDA) is a generative statistical model that explains a set of observations through unobserved groups, and each group explains why some parts of the data are similar. The LDA is an exa ...
, and works by updating an approximate distribution at each node through messages in the node's
Markov blanket
In statistics and machine learning, when one wants to infer a random variable with a set of variables, usually a subset is enough, and other variables are useless. Such a subset that contains all the useful information is called a Markov blanket. ...
.
Likelihood lower bound
Given some set of hidden variables
and observed variables
, the goal of approximate inference is to lower-bound the probability that a graphical model is in the configuration
. Over some probability distribution
(to be defined later),
:
.
So, if we define our lower bound to be
:
,
then the likelihood is simply this bound plus the
relative entropy
Relative may refer to:
General use
*Kinship and family, the principle binding the most basic social units society. If two people are connected by circumstances of birth, they are said to be ''relatives''
Philosophy
*Relativism, the concept that ...
between
and
. Because the relative entropy is non-negative, the function
defined above is indeed a lower bound of the log likelihood of our observation
. The distribution
will have a simpler character than that of
because marginalizing over
is intractable for all but the simplest of
graphical models
''Graphical Models'' is an academic journal in computer graphics and geometry processing publisher by Elsevier. , its editor-in-chief is Jorg Peters of the University of Florida.
History
This journal has gone through multiple names. Founded in 1 ...
. In particular, VMP uses a factorized distribution
:
where
is a disjoint part of the graphical model.
Determining the update rule
The likelihood estimate needs to be as large as possible; because it's a lower bound, getting closer
improves the approximation of the log likelihood. By substituting in the factorized version of
,
, parameterized over the hidden nodes
as above, is simply the negative
relative entropy
Relative may refer to:
General use
*Kinship and family, the principle binding the most basic social units society. If two people are connected by circumstances of birth, they are said to be ''relatives''
Philosophy
*Relativism, the concept that ...
between
and
plus other terms independent of
if
is defined as
:
,
where
is the expectation over all distributions
except
. Thus, if we set
to be
, the bound
is maximized.
Messages in variational message passing
Parents send their children the expectation of their
sufficient statistic
In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the pa ...
while children send their parents their
natural parameter
In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
, which also requires messages to be sent from the co-parents of the node.
Relationship to exponential families
Because all nodes in VMP come from
exponential families
In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
and all parents of nodes are
conjugate to their children nodes, the expectation of the
sufficient statistic
In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the pa ...
can be computed from the
normalization factor.
VMP algorithm
The algorithm begins by computing the expected value of the sufficient statistics for that vector. Then, until the likelihood converges to a stable value (this is usually accomplished by setting a small threshold value and running the algorithm until it increases by less than that threshold value), do the following at each node:
#Get all messages from parents.
#Get all messages from children (this might require the children to get messages from the co-parents).
#Compute the expected value of the nodes sufficient statistics.
Constraints
Because every child must be conjugate to its parent, this has limited the types of distributions that can be used in the model. For example, the parents of a
Gaussian distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu i ...
must be a
Gaussian distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu i ...
(corresponding to the
Mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set.
For a data set, the '' ari ...
) and a
gamma distribution
In probability theory and statistics, the gamma distribution is a two- parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma dis ...
(corresponding to the precision, or one over
in more common parameterizations). Discrete variables can have
Dirichlet
Johann Peter Gustav Lejeune Dirichlet (; 13 February 1805 – 5 May 1859) was a German mathematician who made deep contributions to number theory (including creating the field of analytic number theory), and to the theory of Fourier series and ...
parents, and
Poisson and
exponential
Exponential may refer to any of several mathematical topics related to exponentiation, including:
*Exponential function, also:
**Matrix exponential, the matrix analogue to the above
*Exponential decay, decrease at a rate proportional to value
* Exp ...
nodes must have
gamma parents. More recently, VMP has been extended to handle models that violate this conditional conjugacy constraint.
[
]
References
*
*{{Cite thesis , type=PhD , title=Variational Algorithms for Approximate Bayesian Inference , url=http://www.cs.toronto.edu/~beal/thesis/beal03.pdf , last=Beal , first=M.J. , year=2003 , publisher=Gatsby Computational Neuroscience Unit, University College London , access-date=2007-02-15 , archive-url=https://web.archive.org/web/20050428173705/http://www.cs.toronto.edu/~beal/thesis/beal03.pdf , archive-date=2005-04-28 , url-status=dead
External links
Infer.NET an inference framework which includes an implementation of VMP with examples.
dimple an open-source inference system supporting VMP.
*A
older implementationof VMP with usage examples.
Bayesian networks