In the domain of
physics
Physics is the scientific study of matter, its Elementary particle, fundamental constituents, its motion and behavior through space and time, and the related entities of energy and force. "Physical science is that department of knowledge whi ...
and
probability
Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
, a Markov random field (MRF), Markov network or undirected
graphical model is a set of
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
s having a
Markov property
In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process, which means that its future evolution is independent of its history. It is named after the Russian mathematician Andrey Ma ...
described by an
undirected graph
In discrete mathematics, particularly in graph theory, a graph is a structure consisting of a set of objects where some pairs of the objects are in some sense "related". The objects are represented by abstractions called '' vertices'' (also call ...
. In other words, a
random field is said to be a
Markov Markov ( Bulgarian, ), Markova, and Markoff are common surnames used in Russia and Bulgaria. Notable people with the name include:
Academics
* Ivana Markova (1938–2024), Czechoslovak-British emeritus professor of psychology at the University of S ...
random field if it satisfies Markov properties. The concept originates from the
Sherrington–Kirkpatrick model
In condensed matter physics, a spin glass is a magnetic state characterized by randomness, besides cooperative behavior in freezing of Spin (physics), spins at a temperature called the "freezing temperature," ''T''f. In Ferromagnetism, ferroma ...
.
A Markov network or MRF is similar to a
Bayesian network
A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Whi ...
in its representation of dependencies; the differences being that Bayesian networks are
directed and acyclic, whereas Markov networks are undirected and may be cyclic. Thus, a Markov network can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies ); on the other hand, it can't represent certain dependencies that a Bayesian network can (such as induced dependencies ). The underlying graph of a Markov random field may be finite or infinite.
When the
joint probability density of the random variables is strictly positive, it is also referred to as a Gibbs random field, because, according to the
Hammersley–Clifford theorem, it can then be represented by a
Gibbs measure for an appropriate (locally defined) energy function. The prototypical Markov random field is the
Ising model
The Ising model (or Lenz–Ising model), named after the physicists Ernst Ising and Wilhelm Lenz, is a mathematical models in physics, mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that r ...
; indeed, the Markov random field was introduced as the general setting for the Ising model.
In the domain of
artificial intelligence
Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
, a Markov random field is used to model various low- to mid-level tasks in
image processing
An image or picture is a visual representation. An image can be two-dimensional, such as a drawing, painting, or photograph, or three-dimensional, such as a carving or sculpture. Images may be displayed through other media, including a pr ...
and
computer vision
Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
.
Definition
Given an undirected graph
, a set of random variables
indexed by
form a Markov random field with respect to
if they satisfy the local Markov properties:
:Pairwise Markov property: Any two non-adjacent variables are
conditionally independent given all other variables:
::
:Local Markov property: A variable is conditionally independent of all other variables given its neighbors:
::
:where
is the set of neighbors of
, and
is the
closed neighbourhood of
.
:Global Markov property: Any two subsets of variables are conditionally independent given a separating subset:
::
:where every path from a node in
to a node in
passes through
.
The Global Markov property is stronger than the Local Markov property, which in turn is stronger than the Pairwise one. However, the above three Markov properties are equivalent for positive distributions (those that assign only nonzero probabilities to the associated variables).
The relation between the three Markov properties is particularly clear in the following formulation:
* Pairwise: For any
not equal or adjacent,
.
* Local: For any
and
not containing or adjacent to
,
.
* Global: For any
not intersecting or adjacent,
.
Clique factorization
As the Markov property of an arbitrary probability distribution can be difficult to establish, a commonly used class of Markov random fields are those that can be factorized according to the
cliques of the graph.
Given a set of random variables
, let
be the
probability
Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
of a particular field configuration
in
—that is,
is the probability of finding that the random variables
take on the particular value
. Because
is a set, the probability of
should be understood to be taken with respect to a ''joint distribution'' of the
.
If this joint density can be factorized over the cliques of
as
:
then
forms a Markov random field with respect to
. Here,
is the set of cliques of
. The definition is equivalent if only maximal cliques are used. The functions
are sometimes referred to as ''factor potentials'' or ''clique potentials''. Note, however, conflicting terminology is in use: the word ''potential'' is often applied to the logarithm of
. This is because, in
statistical mechanics
In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. Sometimes called statistical physics or statistical thermodynamics, its applicati ...
,
has a direct interpretation as the
potential energy
In physics, potential energy is the energy of an object or system due to the body's position relative to other objects, or the configuration of its particles. The energy is equal to the work done against any restoring forces, such as gravity ...
of a
configuration
Configuration or configurations may refer to:
Computing
* Computer configuration or system configuration
* Configuration file, a software file used to configure the initial settings for a computer program
* Configurator, also known as choice board ...
.
Some MRF's do not factorize: a simple example can be constructed on a cycle of 4 nodes with some infinite energies, i.e. configurations of zero probabilities, even if one, more appropriately, allows the infinite energies to act on the complete graph on
.
MRF's factorize if at least one of the following conditions is fulfilled:
* the density is strictly positive (by the
Hammersley–Clifford theorem)
* the graph is
chordal (by equivalence to a
Bayesian network
A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Whi ...
)
When such a factorization does exist, it is possible to construct a
factor graph
A factor graph is a bipartite graph representing the factorization of a function (mathematics), function. In probability theory and its applications, factor graphs are used to represent factorization of a Probability distribution function (disam ...
for the network.
Exponential family
Any positive Markov random field can be written as exponential family in canonical form with feature functions
such that the full-joint distribution can be written as
:
where the notation
:
is simply a
dot product
In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a Scalar (mathematics), scalar as a result". It is also used for other symmetric bilinear forms, for example in a pseudo-Euclidean space. N ...
over field configurations, and ''Z'' is the
partition function:
:
Here,
denotes the set of all possible assignments of values to all the network's random variables. Usually, the feature functions
are defined such that they are
indicators
Indicator may refer to:
Biology
* Environmental indicator of environmental health (pressures, conditions and responses)
* Ecological indicator of ecosystem health (ecological processes)
* Health indicator, which is used to describe the health ...
of the clique's configuration, ''i.e.''
if
corresponds to the ''i''-th possible configuration of the ''k''-th clique and 0 otherwise. This model is equivalent to the clique factorization model given above, if
is the cardinality of the clique, and the weight of a feature
corresponds to the logarithm of the corresponding clique factor, ''i.e.''
, where
is the ''i''-th possible configuration of the ''k''-th clique, ''i.e.'' the ''i''-th value in the domain of the clique
.
The probability ''P'' is often called the Gibbs measure. This expression of a Markov field as a logistic model is only possible if all clique factors are non-zero, ''i.e.'' if none of the elements of
are assigned a probability of 0. This allows techniques from matrix algebra to be applied, ''e.g.'' that the
trace
Trace may refer to:
Arts and entertainment Music
* ''Trace'' (Son Volt album), 1995
* ''Trace'' (Died Pretty album), 1993
* Trace (band), a Dutch progressive rock band
* ''The Trace'' (album), by Nell
Other uses in arts and entertainment
* ...
of a matrix is log of the
determinant
In mathematics, the determinant is a Scalar (mathematics), scalar-valued function (mathematics), function of the entries of a square matrix. The determinant of a matrix is commonly denoted , , or . Its value characterizes some properties of the ...
, with the matrix representation of a graph arising from the graph's
incidence matrix
In mathematics, an incidence matrix is a logical matrix that shows the relationship between two classes of objects, usually called an incidence relation. If the first class is ''X'' and the second is ''Y'', the matrix has one row for each element o ...
.
The importance of the partition function ''Z'' is that many concepts from
statistical mechanics
In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. Sometimes called statistical physics or statistical thermodynamics, its applicati ...
, such as
entropy
Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...
, directly generalize to the case of Markov networks, and an ''intuitive'' understanding can thereby be gained. In addition, the partition function allows
variational methods to be applied to the solution of the problem: one can attach a driving force to one or more of the random variables, and explore the reaction of the network in response to this
perturbation. Thus, for example, one may add a driving term ''J''
''v'', for each vertex ''v'' of the graph, to the partition function to get:
:
Formally differentiating with respect to ''J''
''v'' gives the
expectation value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first moment) is a generalization of the weighted average. Informally, the expected va ...
of the random variable ''X''
''v'' associated with the vertex ''v'':
:
Correlation functions are computed likewise; the two-point correlation is:
:
Unfortunately, though the likelihood of a logistic Markov network is convex, evaluating the likelihood or gradient of the likelihood of a model requires inference in the model, which is generally computationally infeasible (see
'Inference' below).
Examples
Gaussian
A
multivariate normal distribution
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional ( univariate) normal distribution to higher dimensions. One d ...
forms a Markov random field with respect to a graph
if the missing edges correspond to zeros on the
precision matrix
In statistics, the precision matrix or concentration matrix is the matrix inverse of the covariance matrix or dispersion matrix, P = \Sigma^.
For univariate distributions, the precision matrix degenerates into a scalar precision, defined as the ...
(the inverse
covariance matrix
In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
):
:
such that
:
Inference
As in a
Bayesian network
A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Whi ...
, one may calculate the
conditional distribution
Conditional (if then) may refer to:
* Causal conditional, if X then Y, where X is a cause of Y
*Conditional probability, the probability of an event A given that another event B
* Conditional proof, in logic: a proof that asserts a conditional, ...
of a set of nodes
given values to another set of nodes
in the Markov random field by summing over all possible assignments to
; this is called
exact inference. However, exact inference is a
#P-complete problem, and thus computationally intractable in the general case. Approximation techniques such as
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that ...
and loopy
belief propagation
Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for ea ...
are often more feasible in practice. Some particular subclasses of MRFs, such as trees (see
Chow–Liu tree), have polynomial-time inference algorithms; discovering such subclasses is an active research topic. There are also subclasses of MRFs that permit efficient
MAP
A map is a symbolic depiction of interrelationships, commonly spatial, between things within a space. A map may be annotated with text and graphics. Like any graphic, a map may be fixed to paper or other durable media, or may be displayed on ...
, or most likely assignment, inference; examples of these include associative networks. Another interesting sub-class is the one of decomposable models (when the graph is
chordal): having a closed-form for the
MLE, it is possible to discover a consistent structure for hundreds of variables.
Conditional random fields
One notable variant of a Markov random field is a
conditional random field
Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without consi ...
, in which each random variable may also be conditioned upon a set of global observations
. In this model, each function
is a mapping from all assignments to both the
clique ''k'' and the observations
to the nonnegative real numbers. This form of the Markov network may be more appropriate for producing
discriminative classifiers, which do not model the distribution over the observations. CRFs were proposed by
John D. Lafferty,
Andrew McCallum and
Fernando C.N. Pereira in 2001.
Varied applications
Markov random fields find application in a variety of fields, ranging from
computer graphics
Computer graphics deals with generating images and art with the aid of computers. Computer graphics is a core technology in digital photography, film, video games, digital art, cell phone and computer displays, and many specialized applications. ...
to computer vision,
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
or
computational biology
Computational biology refers to the use of techniques in computer science, data analysis, mathematical modeling and Computer simulation, computational simulations to understand biological systems and relationships. An intersection of computer sci ...
,
[ and ]information retrieval
Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
. MRFs are used in image processing to generate textures as they can be used to generate flexible and stochastic image models. In image modelling, the task is to find a suitable intensity distribution of a given image, where suitability depends on the kind of task and MRFs are flexible enough to be used for image and texture synthesis, image compression
Image compression is a type of data compression applied to digital images, to reduce their cost for computer data storage, storage or data transmission, transmission. Algorithms may take advantage of visual perception and the statistical properti ...
and restoration, image segmentation
In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects (Set (mathematics), sets of pixels). The goal of segmen ...
, 3D image inference from 2D images, image registration
Image registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs, data from different sensors, times, depths, or viewpoints. It is used in computer vision, medical imaging, mil ...
, texture synthesis, super-resolution, stereo matching and information retrieval
Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
. They can be used to solve various computer vision problems which can be posed as energy minimization problems or problems where different regions have to be distinguished using a set of discriminating features, within a Markov random field framework, to predict the category of the region. Markov random fields were a generalization over the Ising model and have, since then, been used widely in combinatorial optimizations and networks.
See also
* Constraint composite graph
* Graphical model
* Dependency network (graphical model)
* Hammersley–Clifford theorem
* Hopfield network
A Hopfield network (or associative memory) is a form of recurrent neural network, or a spin glass system, that can serve as a content-addressable memory. The Hopfield network, named for John Hopfield, consists of a single layer of neurons, where ...
* Interacting particle system
In probability theory, an interacting particle system (IPS) is a stochastic process (X(t))_ on some configuration space \Omega= S^G given by a site space, a countably-infinite-order graph G and a local state space, a compact metric space S ...
* Ising model
The Ising model (or Lenz–Ising model), named after the physicists Ernst Ising and Wilhelm Lenz, is a mathematical models in physics, mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that r ...
* Log-linear analysis
Log-linear analysis is a technique used in statistics to examine the relationship between more than two categorical variables. The technique is used for both hypothesis testing and model building. In both these uses, models are tested to find the m ...
* Markov chain
In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally ...
* Markov logic network A Markov logic network (MLN) is a probabilistic logic which applies the ideas of a Markov network to first-order logic, defining probability distributions on possible worlds on any given domain.
History
In 2002, Ben Taskar, Pieter Abbeel and ...
* Maximum entropy method
* Stochastic cellular automaton
References
{{Stochastic processes
Graphical models