In mathematics, Laplace's approximation fits an un-normalised
Gaussian
Carl Friedrich Gauss (1777–1855) is the eponym of all of the topics listed below.
There are over 100 topics all named after this German mathematician and scientist, all in the fields of mathematics, physics, and astronomy. The English eponymo ...
approximation to a (twice differentiable) un-normalised target density.
In
Bayesian statistical inference this is useful to simultaneously approximate the posterior and the marginal likelihood, see also
Approximate inference Approximate inference methods make it possible to learn realistic models from big data by trading off computation time for accuracy, when exact learning and inference are computationally intractable.
Major methods classes
*Laplace's approximation ...
. The method works by matching the log density and curvature at a mode of the target density.
For example, a (possibly non-linear) regression or classification model with data set
comprising inputs
and outputs
has (unknown) parameter vector
of length
. The
likelihood
The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...
is denoted
and the parameter
prior . The joint density of outputs and parameters
is the object of inferential desire
:
The joint is equal to the product of the likelihood and the prior and by
Bayes' rule
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For exampl ...
, equal to the product of the
marginal likelihood
A marginal likelihood is a likelihood function that has been integrated over the parameter space. In Bayesian statistics, it represents the probability of generating the observed sample from a prior and is therefore often referred to as model ev ...
and
posterior . Seen as a function of
the joint is an un-normalised density. In Laplace's approximation we approximate the joint by an un-normalised Gaussian
, where we use
to denote approximate density,
for un-normalised density and
is a constant (independent of
). Since the marginal likelihood
doesn't depend on the parameter
and the posterior
normalises over
we can immediately identify them with
and
of our approximation, respectively. Laplace's approximation is
:
where we have defined
:
where
is the location of a mode of the joint target density, also known as the
maximum a posteriori
In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the ...
or MAP point and
is the
positive definite matrix of second derivatives of the negative log joint target density at the mode
. Thus, the Gaussian approximation matches the value and the curvature of the un-normalised target density at the mode. The value of
is usually found using a gradient based method, e.g.
Newton's method
In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real ...
. In summary, we have
:
for the approximate posterior over
and the approximate log marginal likelihood respectively. In the special case of
Bayesian linear regression
Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients (as wel ...
with a Gaussian prior, the approximation is exact. The main weaknesses of Laplace's approximation are that it is symmetric around the mode and that it is very local: the entire approximation is derived from properties at a single point of the target density. Laplace's method is widely used and was pioneered in the context of neural networks by David MacKay and for
Gaussian process
In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution, i.e. ...
es by Williams and Barber, see references.
References
Sources
*
*
{{improve categories, date=October 2022
Approximations
Statistical algorithms