In
statistics and
statistical physics
Statistical physics is a branch of physics that evolved from a foundation of statistical mechanics, which uses methods of probability theory and statistics, and particularly the mathematical tools for dealing with large populations and approxim ...
, the Metropolis–Hastings algorithm is a
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
(MCMC) method for obtaining a sequence of
random samples
In statistics, quality assurance, and Statistical survey, survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a population (statistics), statistical population to estimate characteristics o ...
from a
probability distribution from which direct sampling is difficult. This sequence can be used to approximate the distribution (e.g. to generate a
histogram) or to
compute an integral (e.g. an
expected value). Metropolis–Hastings and other MCMC algorithms are generally used for sampling from multi-dimensional distributions, especially when the number of dimensions is high. For single-dimensional distributions, there are usually other methods (e.g.
adaptive rejection sampling) that can directly return independent samples from the distribution, and these are free from the problem of
autocorrelated samples that is inherent in MCMC methods.
History
The algorithm was named after
Nicholas Metropolis
Nicholas Constantine Metropolis (Greek: ; June 11, 1915 – October 17, 1999) was a Greek-American physicist.
Metropolis received his BSc (1937) and PhD in physics (1941, with Robert Mulliken) at the University of Chicago. Shortly afterwards, ...
and
W.K. Hastings. Metropolis was the first author to appear on the list of authors of the 1953 article ''
Equation of State Calculations by Fast Computing Machines'' together with
Arianna W. Rosenbluth,
Marshall Rosenbluth,
Augusta H. Teller and
Edward Teller
Edward Teller ( hu, Teller Ede; January 15, 1908 – September 9, 2003) was a Hungarian-American theoretical physicist who is known colloquially as "the father of the hydrogen bomb" (see the Teller–Ulam design), although he did not care for ...
. This article proposed the algorithm for the case of symmetrical proposal distributions, and for many years was known as the "Metropolis algorithm." In 1970, Hastings extended it to the more general case.
[ The generalized method eventually was given both names, although the first use of the term "Metropolis-Hastings algorithm" is unclear; it may have first appeared in a 1995 review by Chib and Greenberg.
Some controversy exists with regard to credit for development of the Metropolis algorithm. Metropolis, who was familiar with the computational aspects of the method, had coined the term "Monte Carlo" in an earlier article with ]Stanisław Ulam
Stanisław Marcin Ulam (; 13 April 1909 – 13 May 1984) was a Polish-American scientist in the fields of mathematics and nuclear physics. He participated in the Manhattan Project, originated the Teller–Ulam design of thermonuclear weapon ...
, and led the group in the Theoretical Division that designed and built the MANIAC I
__NOTOC__
The MANIAC I (''Mathematical Analyzer Numerical Integrator and Automatic Computer Model I'') was an early computer built under the direction of Nicholas Metropolis at the Los Alamos Scientific Laboratory. It was based on the von Neuma ...
computer used in the experiments in 1952. However, prior to 2003 there was no detailed account of the algorithm's development. Shortly before his death, Marshall Rosenbluth attended a 2003 conference at LANL marking the 50th anniversary of the 1953 publication. At this conference, Rosenbluth described the algorithm and its development in a presentation titled "Genesis of the Monte Carlo Algorithm for Statistical Mechanics".[ Further historical clarification is made by Gubernatis in a 2005 journal article][ recounting the 50th anniversary conference. Rosenbluth makes it clear that he and his wife Arianna did the work, and that Metropolis played no role in the development other than providing computer time.
This contradicts an account by Edward Teller, who states in his memoirs that the five authors of the 1953 article worked together for "days (and nights)".][ In contrast, the detailed account by Rosenbluth credits Teller with a crucial but early suggestion to "take advantage of statistical mechanics and take ensemble averages instead of following detailed kinematics". This, says Rosenbluth, started him thinking about the generalized Monte Carlo approach – a topic which he says he had discussed often with ]John Von Neumann
John von Neumann (; hu, Neumann János Lajos, ; December 28, 1903 – February 8, 1957) was a Hungarian-American mathematician, physicist, computer scientist, engineer and polymath. He was regarded as having perhaps the widest cove ...
. Arianna Rosenbluth recounted (to Gubernatis in 2003) that Augusta Teller started the computer work, but that Arianna herself took it over and wrote the code from scratch. In an oral history recorded shortly before his death,[ Rosenbluth again credits Teller with posing the original problem, himself with solving it, and Arianna with programming the computer.
]
Intuition
The Metropolis–Hastings algorithm can draw samples from any probability distribution with probability density
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can ...
, provided that we know a function proportional to the density
Density (volumetric mass density or specific mass) is the substance's mass per unit of volume. The symbol most often used for density is ''ρ'' (the lower case Greek letter rho), although the Latin letter ''D'' can also be used. Mathematical ...
and the values of can be calculated. The requirement that must only be proportional to the density, rather than exactly equal to it, makes the Metropolis–Hastings algorithm particularly useful, because calculating the necessary normalization factor is often extremely difficult in practice.
The Metropolis–Hastings algorithm works by generating a sequence of sample values in such a way that, as more and more sample values are produced, the distribution of values more closely approximates the desired distribution. These sample values are produced iteratively, with the distribution of the next sample being dependent only on the current sample value (thus making the sequence of samples into a Markov chain). Specifically, at each iteration, the algorithm picks a candidate for the next sample value based on the current sample value. Then, with some probability, the candidate is either accepted (in which case the candidate value is used in the next iteration) or rejected (in which case the candidate value is discarded, and current value is reused in the next iteration)—the probability of acceptance is determined by comparing the values of the function of the current and candidate sample values with respect to the desired distribution.
For the purpose of illustration, the Metropolis algorithm, a special case of the Metropolis–Hastings algorithm where the proposal function is symmetric, is described below.
Metropolis algorithm (symmetric proposal distribution)
Let be a function that is proportional to the desired probability density function (a.k.a. a target distribution).
# Initialization: Choose an arbitrary point to be the first observation in the sample and choose an arbitrary probability density (sometimes written ) that suggests a candidate for the next sample value , given the previous sample value . In this section, is assumed to be symmetric; in other words, it must satisfy . A usual choice is to let be a Gaussian distribution centered at , so that points closer to are more likely to be visited next, making the sequence of samples into a random walk
In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space.
An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...
. The function is referred to as the ''proposal density'' or ''jumping distribution''.
# For each iteration ''t'':
#* ''Generate'' a candidate for the next sample by picking from the distribution .
#* ''Calculate'' the ''acceptance ratio'' , which will be used to decide whether to accept or reject the candidate. Because ''f'' is proportional to the density of ''P'', we have that .
#* ''Accept or reject'':
#** Generate a uniform random number