HOME

TheInfoList



OR:

In mathematics, Laplace's approximation fits an un-normalised
Gaussian Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below. There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...
approximation to a (twice differentiable) un-normalised target density. In Bayesian statistical inference this is useful to simultaneously approximate the posterior and the marginal likelihood, see also
Approximate inference Approximate inference methods make it possible to learn realistic models from big data by trading off computation time for accuracy, when exact learning and inference are computationally intractable. Major methods classes *Laplace's approximation ...
. The method works by matching the log density and curvature at a mode of the target density. For example, a (possibly non-linear) regression or classification model with data set \_ comprising inputs x and outputs y has (unknown) parameter vector \theta of length D. The
likelihood The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...
is denoted p(, ,\theta) and the parameter prior p(\theta). The joint density of outputs and parameters p(,\theta, ) is the object of inferential desire : p(,\theta, )\;=\;p(, ,\theta)p(\theta)\;=\;p(, )p(\theta, ,)\;\simeq\;\tilde q(\theta)\;=\;Zq(\theta). The joint is equal to the product of the likelihood and the prior and by
Bayes' rule In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For exampl ...
, equal to the product of the
marginal likelihood A marginal likelihood is a likelihood function that has been integrated over the parameter space. In Bayesian statistics, it represents the probability of generating the observed sample from a prior and is therefore often referred to as model ev ...
p(, ) and posterior p(\theta, ,). Seen as a function of \theta the joint is an un-normalised density. In Laplace's approximation we approximate the joint by an un-normalised Gaussian \tilde q(\theta)=Zq(\theta), where we use q to denote approximate density, \tilde q for un-normalised density and Z is a constant (independent of \theta). Since the marginal likelihood p(, ) doesn't depend on the parameter \theta and the posterior p(\theta, ,) normalises over \theta we can immediately identify them with Z and q(\theta) of our approximation, respectively. Laplace's approximation is : p(,\theta, )\;\simeq\;p(,\hat\theta, )\exp\big(-\tfrac(\theta-\hat\theta)S^(\theta-\hat\theta)\big)\;=\;\tilde q(\theta), where we have defined :\begin \hat\theta &\;=\; \operatorname_\theta \log p(,\theta, ),\\ S^ &\;=\; -\left.\nabla_\theta\nabla_\theta\log p(,\theta, )\_, \end where \hat\theta is the location of a mode of the joint target density, also known as the
maximum a posteriori In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the ...
or MAP point and S^ is the D\times D positive definite matrix of second derivatives of the negative log joint target density at the mode \theta=\hat\theta. Thus, the Gaussian approximation matches the value and the curvature of the un-normalised target density at the mode. The value of \hat\theta is usually found using a gradient based method, e.g.
Newton's method In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real ...
. In summary, we have :\begin q(\theta) &\;=\; (\theta, \mu=\hat\theta,\Sigma=S),\\ \log Z &\;=\; \log p(,\hat\theta, ) + \tfrac\log, S, + \tfrac\log(2\pi), \end for the approximate posterior over \theta and the approximate log marginal likelihood respectively. In the special case of
Bayesian linear regression Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients (as wel ...
with a Gaussian prior, the approximation is exact. The main weaknesses of Laplace's approximation are that it is symmetric around the mode and that it is very local: the entire approximation is derived from properties at a single point of the target density. Laplace's method is widely used and was pioneered in the context of neural networks by David MacKay and for
Gaussian process In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution, i.e. ...
es by Williams and Barber, see references.


References


Sources

* * {{improve categories, date=October 2022 Approximations Statistical algorithms