In
computational statistics, the Metropolis-adjusted Langevin algorithm (MALA) or Langevin Monte Carlo (LMC) is a
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
(MCMC) method for obtaining
random samples
In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt ...
– sequences of random observations – from a
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
for which direct sampling is difficult. As the name suggests, MALA uses a combination of two mechanisms to generate the states of a
random walk
In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space.
An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...
that has the target probability distribution as an
invariant measure In mathematics, an invariant measure is a measure that is preserved by some function. The function may be a geometric transformation. For examples, circular angle is invariant under rotation, hyperbolic angle is invariant under squeeze mapping, an ...
:
* new states are proposed using (
overdamped
Damping is an influence within or upon an oscillatory system that has the effect of reducing or preventing its oscillation. In physical systems, damping is produced by processes that dissipate the energy stored in the oscillation. Examples incl ...
)
Langevin dynamics
In physics, Langevin dynamics is an approach to the mathematical modeling of the dynamics of molecular systems. It was originally developed by French physicist Paul Langevin. The approach is characterized by the use of simplified models while acco ...
, which use evaluations of the
gradient
In vector calculus, the gradient of a scalar-valued differentiable function of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p is the "direction and rate of fastest increase". If the gradi ...
of the target
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
;
* these proposals are accepted or rejected using the
Metropolis–Hastings algorithm
In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. This seque ...
, which uses evaluations of the target probability density (but not its gradient).
Informally, the Langevin dynamics drive the random walk towards regions of high probability in the manner of a gradient flow, while the Metropolis–Hastings accept/reject mechanism improves the mixing and convergence properties of this random walk. MALA was originally proposed by
Julian Besag
Julian Ernst Besag FRS (26 March 1945 – 6 August 2010) was a British statistician known chiefly for his work in spatial statistics (including its applications to epidemiology, image analysis and agricultural science), and Bayesian inferenc ...
in 1994,
(although the method
Smart Monte Carlo was already introduced in 1978
) and its properties were examined in detail by
Gareth Roberts together with
Richard Tweedie
Richard Lewis Tweedie (22 August 1947 – 7 June 2001) was an Australian statistician.
Education
After having completed his undergraduate studies and a Master of Arts at the Australian National University, Tweedie moved to Cambridge University, ...
and
Jeff Rosenthal
Jeffrey Seth Rosenthal (born October 13, 1967 in Scarborough, Toronto, Scarborough, Ontario) is a Canadian statistician and nonfiction author. He is a professor in the University of Toronto's Department of Statistics, cross-appointed with Univer ...
.
Many variations and refinements have been introduced since then, e.g. the
manifold
In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a n ...
variant of Girolami and Calderhead (2011).
The method is equivalent to using the
Hamiltonian Monte Carlo The Hamiltonian Monte Carlo algorithm (originally known as hybrid Monte Carlo) is a Markov chain Monte Carlo method for obtaining a sequence of random samples which converge to being distributed according to a target probability distribution for whi ...
(hybrid Monte Carlo) algorithm with only a single discrete time step.
Further details
Let
denote a probability density function on
, one from which it is desired to draw an ensemble of
independent and identically distributed samples. We consider the overdamped Langevin
Itô diffusion
In mathematics – specifically, in stochastic analysis – an Itô diffusion is a solution to a specific type of stochastic differential equation. That equation is similar to the Langevin equation used in physics to describe the Brownian motion o ...
:
driven by the time derivative of a standard
Brownian motion . (Note that another commonly-used normalization for this diffusion is
:
which generates the same dynamics.) In the limit as
, this probability distribution
of
approaches a stationary distribution, which is also invariant under the diffusion, which we denote
. It turns out that, in fact,
.
Approximate sample paths of the Langevin diffusion can be generated by many discrete-time methods. One of the simplest is the
Euler–Maruyama method with a fixed time step
. We set
and then recursively define an approximation
to the true solution
by
:
where each
is an independent draw from a
multivariate normal distribution
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One d ...
on
with
mean
There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set.
For a data set, the ''arithme ...
0 and
covariance matrix
In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
equal to the
identity matrix
In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere.
Terminology and notation
The identity matrix is often denoted by I_n, or simply by I if the size is immaterial o ...
. Note that
is normally distributed with mean
and covariance equal to
times the
identity matrix.
In contrast to the Euler–Maruyama method for simulating the Langevin diffusion, which always updates
according to the update rule
:
MALA incorporates an additional step. We consider the above update rule as defining a ''proposal''
for a new state,
:
This proposal is accepted or rejected according to the Metropolis-Hastings algorithm: set
:
where
:
is the transition probability density from
to
(note that, in general
). Let
be drawn from the
continuous uniform distribution
In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of symmetric probability distributions. The distribution describes an experiment where there is an arbitrary outcome that lies betw ...
on the interval