In
computational statistics
Computational statistics, or statistical computing, is the study which is the intersection of statistics and computer science, and refers to the statistical methods that are enabled by using computational methods. It is the area of computational ...
, the Metropolis-adjusted Langevin algorithm (MALA) or Langevin Monte Carlo (LMC) is a
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that ...
(MCMC) method for obtaining
random samples
In this statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample (termed sample for short) of individuals from within a statistical population to estimate characteristics of the whole ...
– sequences of random observations – from a
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
for which direct sampling is difficult. As the name suggests, MALA uses a combination of two mechanisms to generate the states of a
random walk
In mathematics, a random walk, sometimes known as a drunkard's walk, is a stochastic process that describes a path that consists of a succession of random steps on some Space (mathematics), mathematical space.
An elementary example of a rand ...
that has the target probability distribution as an
invariant measure
In mathematics, an invariant measure is a measure that is preserved by some function. The function may be a geometric transformation. For examples, circular angle is invariant under rotation, hyperbolic angle is invariant under squeeze mappin ...
:
* new states are proposed using (
overdamped
In physical systems, damping is the loss of energy of an oscillating system by dissipation. Damping is an influence within or upon an oscillatory system that has the effect of reducing or preventing its oscillation. Examples of damping include ...
)
Langevin dynamics
In physics, Langevin dynamics is an approach to the mathematical modeling of the dynamics of molecular systems using the Langevin equation. It was originally developed by French physicist Paul Langevin. The approach is characterized by the use o ...
, which use evaluations of the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...
of the target
probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
;
* these proposals are accepted or rejected using the
Metropolis–Hastings algorithm
In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. New sample ...
, which uses evaluations of the target probability density (but not its gradient).
Informally, the Langevin dynamics drive the random walk towards regions of high probability in the manner of a gradient flow, while the Metropolis–Hastings accept/reject mechanism improves the mixing and convergence properties of this random walk. MALA was originally proposed by
Julian Besag
Julian Ernst Besag Fellow of the Royal Society, FRS (26 March 1945 – 6 August 2010) was a British statistician known chiefly for his work in spatial statistics (including its applications to epidemiology, image analysis and agricultural science ...
in 1994,
(although the method
Smart Monte Carlo was already introduced in 1978
) and its properties were examined in detail by
Gareth Roberts together with
Richard Tweedie
Richard Lewis Tweedie (22 August 1947 – 7 June 2001) was an Australian statistician.
Education
After having completed his undergraduate studies and a Master of Arts at the Australian National University, Tweedie moved to Cambridge University, ...
and
Jeff Rosenthal.
Many variations and refinements have been introduced since then, e.g. the
manifold
In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a N ...
variant of Girolami and Calderhead (2011).
The method is equivalent to using the
Hamiltonian Monte Carlo
The Hamiltonian Monte Carlo algorithm (originally known as hybrid Monte Carlo) is a Markov chain Monte Carlo method for obtaining a sequence of random samples whose distribution converges to a target probability distribution that is difficult to ...
(hybrid Monte Carlo) algorithm with only a single discrete time step.
Further details
Let
denote a probability density function on
, one from which it is desired to draw an ensemble of
independent and identically distributed
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in Pennsylvania, United States
* Independentes (English: Independents), a Portuguese artist ...
samples. We consider the overdamped Langevin
Itô diffusion
In mathematics – specifically, in stochastic analysis – an Itô diffusion is a solution to a specific type of stochastic differential equation. That equation is similar to the Langevin equation used in physics to describe the Brownian motion ...
:
driven by the time derivative of a standard
Brownian motion
Brownian motion is the random motion of particles suspended in a medium (a liquid or a gas). The traditional mathematical formulation of Brownian motion is that of the Wiener process, which is often called Brownian motion, even in mathematical ...
. (Note that another commonly-used normalization for this diffusion is
:
which generates the same dynamics.) In the limit as
, this probability distribution
of
approaches a stationary distribution, which is also invariant under the diffusion, which we denote
. It turns out that, in fact,
.
Approximate sample paths of the Langevin diffusion can be generated by many discrete-time methods. One of the simplest is the
Euler–Maruyama method
In Itô calculus, the Euler–Maruyama method (also simply called the Euler method) is a method for the approximate numerical analysis, numerical solution of a stochastic differential equation (SDE). It is an extension of the Euler method for ord ...
with a fixed time step
. We set
and then recursively define an approximation
to the true solution
by
:
where each
is an independent draw from a
multivariate normal distribution
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One d ...
on
with
mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
0 and
covariance matrix
In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
equal to the
identity matrix
In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. It has unique properties, for example when the identity matrix represents a geometric transformation, the obje ...
. Note that
is normally distributed with mean
and covariance equal to
times the
identity matrix.
In contrast to the Euler–Maruyama method for simulating the Langevin diffusion, which always updates
according to the update rule
:
MALA incorporates an additional step. We consider the above update rule as defining a ''proposal''
for a new state,
:
This proposal is accepted or rejected according to the Metropolis-Hastings algorithm: set
:
where
:
is the transition probability density from
to
(note that, in general
). Let
be drawn from the
continuous uniform distribution
In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a family of symmetric probability distributions. Such a distribution describes an experiment where there is an arbitrary outcome that li ...
on the interval