Variational message passing (VMP) is an

approximate inference Approximate inference methods make it possible to learn realistic models from big data by trading off computation time for accuracy, when exact learning and inference are computationally intractable. Major methods classes *Laplace's approximation ...

technique for continuous- or discrete-valued

Bayesian networks A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their Conditional dependence, conditional dependencies via a directed a ...

, with conjugate-exponential parents, developed by John Winn. VMP was developed as a means of generalizing the approximate

variational methods The calculus of variations (or variational calculus) is a field of mathematical analysis that uses variations, which are small changes in functions and functionals, to find maxima and minima of functionals: mappings from a set of functions t ...

used by such techniques as

latent Dirichlet allocation In natural language processing, latent Dirichlet allocation (LDA) is a Bayesian network (and, therefore, a generative statistical model) for modeling automatically extracted topics in textual corpora. The LDA is an example of a Bayesian topic ...

, and works by updating an approximate distribution at each node through messages in the node's

Markov blanket In statistics and machine learning, a Markov blanket of a random variable is a minimal set of variables that renders the variable conditionally independent of all other variables in the system. This concept is central in probabilistic graphical ...

Likelihood lower bound

Given some set of hidden variables

H

and observed variables

V

, the goal of approximate inference is to maximize a lower-bound on the probability that a graphical model is in the configuration

V

. Over some probability distribution

Q

(to be defined later), :

\ln P(V) = \sum_H Q(H) \ln \frac = \sum_ Q(H) \Bigg \ln \frac - \ln \frac \Bigg

. So, if we define our lower bound to be :

L(Q) = \sum_ Q(H) \ln \frac

, then the likelihood is simply this bound plus the

relative entropy Relative may refer to: General use *Kinship and family, the principle binding the most basic social units of society. If two people are connected by circumstances of birth, they are said to be ''relatives''. Philosophy *Relativism, the concept t ...

between

P

and

Q

. Because the relative entropy is non-negative, the function

L

defined above is indeed a lower bound of the log likelihood of our observation

V

. The distribution

Q

will have a simpler character than that of

P

because marginalizing over

P

is intractable for all but the simplest of

graphical models A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. Graphical models are commonly used i ...

. In particular, VMP uses a factorized distribution :

Q(H) = \prod_i Q_i(H_i),

where

H_i

is a disjoint part of the graphical model.

Determining the update rule

The likelihood estimate needs to be as large as possible; because it's a lower bound, getting closer

\log P

improves the approximation of the log likelihood. By substituting in the factorized version of

Q

L(Q)

, parameterized over the hidden nodes

H_i

as above, is simply the negative

between

Q_j

and

Q_j^*

plus other terms independent of

Q_j

Q_j^*

is defined as :

Q_j^*(H_j) = \frac e^

, where

\mathbb_\

is the expectation over all distributions

Q_i

except

Q_j

. Thus, if we set

Q_j

to be

Q_j^*

, the bound

L

is maximized.

Messages in variational message passing

Parents send their children the expectation of their

sufficient statistic In statistics, sufficiency is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. A sufficient statistic contains all of the information that the dataset provides about the model parameters. It ...

while children send their parents their

natural parameter In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...

, which also requires messages to be sent from the co-parents of the node.

Relationship to exponential families

Because all nodes in VMP come from

exponential families In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...

and all parents of nodes are conjugate to their children nodes, the expectation of the

can be computed from the normalization factor.

VMP algorithm

The algorithm begins by computing the expected value of the sufficient statistics for that vector. Then, until the likelihood converges to a stable value (this is usually accomplished by setting a small threshold value and running the algorithm until it increases by less than that threshold value), do the following at each node: #Get all messages from parents. #Get all messages from children (this might require the children to get messages from the co-parents). #Compute the expected value of the nodes sufficient statistics.

Constraints

Because every child must be conjugate to its parent, this has limited the types of distributions that can be used in the model. For example, the parents of a

Gaussian distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...

must be a

(corresponding to the

Mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

) and a

gamma distribution In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the g ...

(corresponding to the precision, or one over

\sigma

in more common parameterizations). Discrete variables can have

Dirichlet Johann Peter Gustav Lejeune Dirichlet (; ; 13 February 1805 – 5 May 1859) was a German mathematician. In number theory, he proved special cases of Fermat's last theorem and created analytic number theory. In Mathematical analysis, analysis, h ...

parents, and Poisson and

exponential Exponential may refer to any of several mathematical topics related to exponentiation, including: * Exponential function, also: **Matrix exponential, the matrix analogue to the above *Exponential decay, decrease at a rate proportional to value * Ex ...

nodes must have

gamma Gamma (; uppercase , lowercase ; ) is the third letter of the Greek alphabet. In the system of Greek numerals it has a value of 3. In Ancient Greek, the letter gamma represented a voiced velar stop . In Modern Greek, this letter normally repr ...

parents. More recently, VMP has been extended to handle models that violate this conditional conjugacy constraint.

References

* *{{Cite thesis , type=PhD , title=Variational Algorithms for Approximate Bayesian Inference , url=http://www.cs.toronto.edu/~beal/thesis/beal03.pdf , last=Beal , first=M.J. , year=2003 , publisher=Gatsby Computational Neuroscience Unit, University College London , access-date=2007-02-15 , archive-url=https://web.archive.org/web/20050428173705/http://www.cs.toronto.edu/~beal/thesis/beal03.pdf , archive-date=2005-04-28 , url-status=dead

External links

Infer.NET
an inference framework which includes an implementation of VMP with examples.
dimple
an open-source inference system supporting VMP. *A
older implementation
of VMP with usage examples. Bayesian networks