computational statistics Computational statistics, or statistical computing, is the study which is the intersection of statistics and computer science, and refers to the statistical methods that are enabled by using computational methods. It is the area of computational ...

, the Metropolis-adjusted Langevin algorithm (MALA) or Langevin Monte Carlo (LMC) is a

Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that ...

(MCMC) method for obtaining

random samples In this statistics, quality assurance, and survey methodology, sampling is the selection of a subset or a statistical sample (termed sample for short) of individuals from within a statistical population to estimate characteristics of the whole ...

– sequences of random observations – from a

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

for which direct sampling is difficult. As the name suggests, MALA uses a combination of two mechanisms to generate the states of a

random walk In mathematics, a random walk, sometimes known as a drunkard's walk, is a stochastic process that describes a path that consists of a succession of random steps on some Space (mathematics), mathematical space. An elementary example of a rand ...

that has the target probability distribution as an

invariant measure In mathematics, an invariant measure is a measure that is preserved by some function. The function may be a geometric transformation. For examples, circular angle is invariant under rotation, hyperbolic angle is invariant under squeeze mappin ...

: * new states are proposed using (

overdamped In physical systems, damping is the loss of energy of an oscillating system by dissipation. Damping is an influence within or upon an oscillatory system that has the effect of reducing or preventing its oscillation. Examples of damping include ...

)

Langevin dynamics In physics, Langevin dynamics is an approach to the mathematical modeling of the dynamics of molecular systems using the Langevin equation. It was originally developed by French physicist Paul Langevin. The approach is characterized by the use o ...

, which use evaluations of the

gradient In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...

of the target

probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...

; * these proposals are accepted or rejected using the

Metropolis–Hastings algorithm In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. New sample ...

, which uses evaluations of the target probability density (but not its gradient). Informally, the Langevin dynamics drive the random walk towards regions of high probability in the manner of a gradient flow, while the Metropolis–Hastings accept/reject mechanism improves the mixing and convergence properties of this random walk. MALA was originally proposed by

Julian Besag Julian Ernst Besag Fellow of the Royal Society, FRS (26 March 1945 – 6 August 2010) was a British statistician known chiefly for his work in spatial statistics (including its applications to epidemiology, image analysis and agricultural science ...

in 1994, (although the method Smart Monte Carlo was already introduced in 1978 ) and its properties were examined in detail by Gareth Roberts together with

Richard Tweedie Richard Lewis Tweedie (22 August 1947 – 7 June 2001) was an Australian statistician. Education After having completed his undergraduate studies and a Master of Arts at the Australian National University, Tweedie moved to Cambridge University, ...

and Jeff Rosenthal. Many variations and refinements have been introduced since then, e.g. the

manifold In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a N ...

variant of Girolami and Calderhead (2011). The method is equivalent to using the

Hamiltonian Monte Carlo The Hamiltonian Monte Carlo algorithm (originally known as hybrid Monte Carlo) is a Markov chain Monte Carlo method for obtaining a sequence of random samples whose distribution converges to a target probability distribution that is difficult to ...

(hybrid Monte Carlo) algorithm with only a single discrete time step.

Further details

Let

\pi

denote a probability density function on

\mathbb^

, one from which it is desired to draw an ensemble of

independent and identically distributed Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in Pennsylvania, United States * Independentes (English: Independents), a Portuguese artist ...

samples. We consider the overdamped Langevin

Itô diffusion In mathematics – specifically, in stochastic analysis – an Itô diffusion is a solution to a specific type of stochastic differential equation. That equation is similar to the Langevin equation used in physics to describe the Brownian motion ...

\dot = \nabla \log \pi(X) + \sqrt \dot

driven by the time derivative of a standard

Brownian motion Brownian motion is the random motion of particles suspended in a medium (a liquid or a gas). The traditional mathematical formulation of Brownian motion is that of the Wiener process, which is often called Brownian motion, even in mathematical ...

W

. (Note that another commonly-used normalization for this diffusion is :

\dot = \frac \nabla \log \pi(X) + \dot,

which generates the same dynamics.) In the limit as

t \to \infty

, this probability distribution

\rho(t)

X(t)

approaches a stationary distribution, which is also invariant under the diffusion, which we denote

\rho_\infty

. It turns out that, in fact,

\rho_\infty = \pi

. Approximate sample paths of the Langevin diffusion can be generated by many discrete-time methods. One of the simplest is the

Euler–Maruyama method In Itô calculus, the Euler–Maruyama method (also simply called the Euler method) is a method for the approximate numerical analysis, numerical solution of a stochastic differential equation (SDE). It is an extension of the Euler method for ord ...

with a fixed time step

\tau > 0

. We set

X_0 := x_0

and then recursively define an approximation

X_k

to the true solution

X(k \tau)

by :

X_ := X_k + \tau \nabla \log \pi(X_k) + \sqrt \xi_k,

where each

\xi_

is an independent draw from a

multivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One d ...

\mathbb^

with

mean A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...

0 and

covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...

equal to the

d \times d

identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. It has unique properties, for example when the identity matrix represents a geometric transformation, the obje ...

. Note that

X_

is normally distributed with mean

X_k + \tau \nabla \log \pi(X_k)

and covariance equal to

2 \tau

times the

d \times d

identity matrix. In contrast to the Euler–Maruyama method for simulating the Langevin diffusion, which always updates

X_k

according to the update rule :

X_ := X_k + \tau \nabla \log \pi(X_k) + \sqrt \xi_k,

MALA incorporates an additional step. We consider the above update rule as defining a ''proposal''

\tilde_

for a new state, :

\tilde_ := X_k + \tau \nabla \log \pi(X_k) + \sqrt \xi_k.

This proposal is accepted or rejected according to the Metropolis-Hastings algorithm: set :

\alpha := \min \left\,

where :

q(x'\mid x) \propto \exp \left( - \frac \,  x' - x - \tau \nabla \log \pi(x) \, _2^2 \right)

is the transition probability density from

x

x'

(note that, in general

q(x'\mid x) \neq q(x\mid x')

). Let

u

be drawn from the

continuous uniform distribution In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a family of symmetric probability distributions. Such a distribution describes an experiment where there is an arbitrary outcome that li ...

on the interval

, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

/math>. If

u \leq \alpha

, then the proposal is accepted, and we set

X_ := \tilde_

; otherwise, the proposal is rejected, and we set

X_ := X_k

. The combined dynamics of the Langevin diffusion and the Metropolis–Hastings algorithm satisfy the

detailed balance The principle of detailed balance can be used in Kinetics (physics), kinetic systems which are decomposed into elementary processes (collisions, or steps, or elementary reactions). It states that at Thermodynamic equilibrium, equilibrium, each elem ...

conditions necessary for the existence of a unique, invariant, stationary distribution

\rho_ = \pi

. Compared to naive Metropolis–Hastings, MALA has the advantage that it usually proposes moves into regions of higher

\pi

probability, which are then more likely to be accepted. On the other hand, when

\pi

is strongly

anisotropic Anisotropy () is the structural property of non-uniformity in different directions, as opposed to isotropy. An anisotropic object or pattern has properties that differ according to direction of measurement. For example, many materials exhibit ver ...

(i.e. it varies much more quickly in some directions than others), it is necessary to take

0 < \tau \ll 1

in order to properly capture the Langevin dynamics; the use of a positive-definite

preconditioning In mathematics, preconditioning is the application of a transformation, called the preconditioner, that conditions a given problem into a form that is more suitable for numerical solving methods. Preconditioning is typically related to reducing ...

matrix

A \in \mathbb^

can help to alleviate this problem, by generating proposals according to :

\tilde_ := X_k + \tau A \nabla \log \pi(X_k) + \sqrt \xi_k,

so that

\tilde_{k + 1}

has mean

X_k + \tau A \nabla \log \pi(X_k)

and covariance

2 \tau A

. For limited classes of target distributions, the optimal acceptance rate for this algorithm can be shown to be

0.574

; if it is discovered to be substantially different in practice,

\tau

should be modified accordingly.

References

Monte Carlo methods Markov chain Monte Carlo Sampling techniques