A maximal entropy random walk (MERW) is a popular type of
biased random walk on a graph
Bias is an inclination toward something, or a predisposition, partiality, prejudice, preference, or predilection.
Bias may also refer to:
Scientific method and statistics
* The bias introduced into an experiment through a confounder
* Al ...
, in which transition probabilities are chosen accordingly to the
principle of maximum entropy
The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...
, which says that the
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
which best represents the current state of knowledge is the one with largest entropy. While a standard
random walk
In mathematics, a random walk, sometimes known as a drunkard's walk, is a stochastic process that describes a path that consists of a succession of random steps on some Space (mathematics), mathematical space.
An elementary example of a rand ...
samples for every vertex a uniform probability distribution of outgoing edges, locally maximizing
entropy rate
In the mathematical theory of probability, the entropy rate or source information rate is a function assigning an entropy to a stochastic process.
For a strongly stationary process, the conditional entropy for latest random variable eventually ...
, MERW maximizes it globally (average
entropy production
Entropy production (or generation) is the amount of entropy which is produced during heat process to evaluate the efficiency of the process.
Short history
Entropy is produced in irreversible processes. The importance of avoiding irreversible p ...
) by sampling a uniform probability distribution among all paths in a given graph.
MERW is used in various fields of science. A direct application is choosing probabilities to maximize transmission rate through a constrained channel, analogously to
Fibonacci coding
In mathematics and computing, Fibonacci coding is a universal code which encodes positive integers into binary code words. It is one example of representations of integers based on Fibonacci numbers. Each code word ends with "11" and contains n ...
. Its properties also made it useful for example in analysis of complex networks,
like
link prediction,
community detection,
robust transport over networks
and
centrality
In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, ke ...
measures.
It is also used in
image analysis
Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading barcode, bar coded tags or a ...
, for example for detecting visual saliency regions,
object localization,
[L. Wang, J. Zhao, X. Hu, J. Lu, ''Weakly supervised object localization via maximal entropy random walk''](_blank)
ICIP, 2014. tampering detection
or
tractography
In neuroscience
Neuroscience is the scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions, and its disorders. It is a multidisciplinary science that combines physiology, anatomy, ...
problem.
Additionally, it recreates some properties of
quantum mechanics
Quantum mechanics is the fundamental physical Scientific theory, theory that describes the behavior of matter and of light; its unusual characteristics typically occur at and below the scale of atoms. Reprinted, Addison-Wesley, 1989, It is ...
, suggesting a way to repair the discrepancy between
diffusion
Diffusion is the net movement of anything (for example, atoms, ions, molecules, energy) generally from a region of higher concentration to a region of lower concentration. Diffusion is driven by a gradient in Gibbs free energy or chemical p ...
models and quantum predictions, like
Anderson localization
In condensed matter physics, Anderson localization (also known as strong localization) is the absence of diffusion of waves in a ''disordered'' medium. This phenomenon is named after the American physicist P. W. Anderson, who was the first to su ...
.
Basic model
Consider a
graph
Graph may refer to:
Mathematics
*Graph (discrete mathematics), a structure made of vertices and edges
**Graph theory, the study of such graphs and their properties
*Graph (topology), a topological space resembling a graph in the sense of discret ...
with
vertices, defined by an
adjacency matrix
In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph (discrete mathematics), graph. The elements of the matrix (mathematics), matrix indicate whether pairs of Vertex (graph theory), vertices ...
:
if there is an edge from vertex
to
, 0 otherwise. For simplicity, assume it is an
undirected graph
In discrete mathematics, particularly in graph theory, a graph is a structure consisting of a set of objects where some pairs of the objects are in some sense "related". The objects are represented by abstractions called '' vertices'' (also call ...
, which corresponds to a symmetric
; however, MERWs can also be generalized for directed and
weighted graphs (for example
Boltzmann distribution
In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability tha ...
among paths instead of uniform).
We would like to choose a random walk as a
Markov process
In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, ...
on this graph: for every vertex
and its outgoing edge to
, choose probability
of the walker randomly using this edge after visiting
. Formally, find a
stochastic matrix
In mathematics, a stochastic matrix is a square matrix used to describe the transitions of a Markov chain. Each of its entries is a nonnegative real number representing a probability. It is also called a probability matrix, transition matrix, ''s ...
(containing the transition probabilities of a Markov chain) such that
*
for all
and
*
for all
.
Assuming this graph is connected and not
periodic
Periodicity or periodic may refer to:
Mathematics
* Bott periodicity theorem, addresses Bott periodicity: a modulo-8 recurrence relation in the homotopy groups of classical groups
* Periodic function, a function whose output contains values tha ...
,
ergodic theory
Ergodic theory is a branch of mathematics that studies statistical properties of deterministic dynamical systems; it is the study of ergodicity. In this context, "statistical properties" refers to properties which are expressed through the behav ...
says that evolution of this
stochastic process
In probability theory and related fields, a stochastic () or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Sto ...
leads to some stationary probability distribution
such that
.
Using
Shannon entropy
Shannon may refer to:
People
* Shannon (given name)
* Shannon (surname)
* Shannon (American singer), stage name of singer Brenda Shannon Greene (born 1958)
* Shannon (South Korean singer), British-South Korean singer and actress Shannon Arrum ...
for every vertex and averaging over probability of visiting this vertex (to be able to use its entropy), we get the following formula for average entropy production (
entropy rate
In the mathematical theory of probability, the entropy rate or source information rate is a function assigning an entropy to a stochastic process.
For a strongly stationary process, the conditional entropy for latest random variable eventually ...
) of the stochastic process:
:
This definition turns out to be equivalent to the asymptotic average entropy (per length) of the probability distribution in the space of paths for this stochastic process.
In the standard random walk, referred to here as generic random walk (GRW), we naturally choose that each outgoing edge is equally probable:
:
.
For a symmetric
it leads to a stationary probability distribution
with
:
.
It locally maximizes entropy production (uncertainty) for every vertex, but usually leads to a suboptimal averaged global entropy rate
.
MERW chooses the stochastic matrix which maximizes
, or equivalently assumes uniform probability distribution among all paths in a given graph. Its formula is obtained by first calculating the dominant
eigenvalue
In linear algebra, an eigenvector ( ) or characteristic vector is a vector that has its direction unchanged (or reversed) by a given linear transformation. More precisely, an eigenvector \mathbf v of a linear transformation T is scaled by a ...
and corresponding
eigenvector
In linear algebra, an eigenvector ( ) or characteristic vector is a vector that has its direction unchanged (or reversed) by a given linear transformation. More precisely, an eigenvector \mathbf v of a linear transformation T is scaled by ...
of the adjacency matrix, i.e. the largest
with corresponding
such that
. Then the stochastic matrix and stationary probability distribution are given by
:
for which every possible path of length
from the
-th to
-th vertex has probability
:
.
Its entropy rate is
and the stationary probability distribution
is
:
.
In contrast to GRW, the MERW transition probabilities generally depend on the structure of the entire graph, making it nonlocal. Hence, they should not be imagined as directly applied by the walkerif random-looking decisions are made based on the local situation, like a person would make, the GRW approach is more appropriate. MERW is based on the principle of maximum entropy, making it the safest assumption when we do not have any additional knowledge about the system. For example, it would be appropriate for modelling our knowledge about an object performing some complex dynamicsnot necessarily random, like a particle.
Sketch of derivation
Assume for simplicity that the considered graph is undirected, connected and aperiodic, allowing to conclude from the
Perron–Frobenius theorem
In matrix theory, the Perron–Frobenius theorem, proved by and , asserts that a real square matrix with positive entries has a unique eigenvalue of largest magnitude and that eigenvalue is real. The corresponding eigenvector can be chosen to ha ...
that the dominant eigenvector is unique. Hence
can be asymptotically (
) approximated by
(or
in
bra–ket notation
Bra–ket notation, also called Dirac notation, is a notation for linear algebra and linear operators on complex vector spaces together with their dual space both in the finite-dimensional and infinite-dimensional case. It is specifically de ...
).
MERW requires a uniform distribution along paths. The number
of paths with length
and vertex
in the center is
:
,
hence for all
,
:
.
Analogously calculating probability distribution for two succeeding vertices, one obtains that the probability of being at the
-th vertex and next at the
-th vertex is
:
.
Dividing by the probability of being at the
-th vertex, i.e.
, gives for the
conditional probability
In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...
of the
-th vertex being next after the
-th vertex
:
.
Weighted MERW: Boltzmann path ensemble
We have assumed that
, yielding a MERW corresponding to the uniform ensemble among paths. However, the above derivation works for any real nonnegative
for which the Perron-Frobenius theorem applies. Given
, the probability of a particular length-
path
is as follows:
:
,
which is the same as the
Boltzmann distribution
In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability tha ...
of paths with energy defined as the sum of
over the edges of the path. For example, this can be used with the transfer matrix to calculate the probability distribution of patterns in the
Ising model
The Ising model (or Lenz–Ising model), named after the physicists Ernst Ising and Wilhelm Lenz, is a mathematical models in physics, mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that r ...
.
Examples
Let us first look at a simple nontrivial situation: Fibonacci coding, where we want to transmit a message as a sequence of 0s and 1s, but not using two successive 1s: after a 1 there has to be a 0. To maximize the amount of information transmitted in such sequence, we should assume a uniform probability distribution in the space of all possible sequences fulfilling this constraint.
To practically use such long sequences, after 1 we have to use 0, but there remains the freedom of choosing the probability of 0 after 0. Let us denote this probability
.
Entropy coding
In information theory, an entropy coding (or entropy encoding) is any lossless data compression method that attempts to approach the lower bound declared by Shannon's source coding theorem, which states that any lossless data compression method ...
allows encoding a message using this chosen probability distribution. The stationary probability distribution of symbols for a given
turns out to be
. Hence, entropy produced is
, which is maximized for
, known as the
golden ratio
In mathematics, two quantities are in the golden ratio if their ratio is the same as the ratio of their summation, sum to the larger of the two quantities. Expressed algebraically, for quantities and with , is in a golden ratio to if
\fr ...
. In contrast, a standard random walk would choose the suboptimal
. While choosing a larger
reduces the amount of information produced after 0, it also reduces the frequency of 1, after which we cannot write any information.
A more complex case is the defected one-dimensional cyclic lattice, for example, a ring with 1000 connected nodes, for which all nodes but the defects have a self-loop (edge to itself). In a standard random walk (GRW), the stationary probability distribution would have the defect probability be 2/3 of probability of the non-defect verticesthere is nearly no localization, also analogously for standard diffusion, which is the infinitesimal limit of a GRW. For a MERW, we have to first find the dominant eigenvector of the adjacency matrix – maximizing
in:
for all positions
, where
for defects, 0 otherwise. Substituting
and multiplying the equation by −1 we get:
where
is minimized now, becoming the analog of energy. The formula inside the bracket is
discrete Laplace operator
In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a Graph (discrete mathematics), graph or a lattice (group), discrete grid. For the case of a finite-dimensional graph ...
, making this equation a discrete analogue of the stationary
Schrödinger equation
The Schrödinger equation is a partial differential equation that governs the wave function of a non-relativistic quantum-mechanical system. Its discovery was a significant landmark in the development of quantum mechanics. It is named after E ...
. As in quantum mechanics, MERWs predict that the probability distribution is that of the quantum
ground state
The ground state of a quantum-mechanical system is its stationary state of lowest energy; the energy of the ground state is known as the zero-point energy of the system. An excited state is any state with energy greater than the ground state ...
:
with its strongly localized density (in contrast to standard diffusion). Taking the
infinitesimal
In mathematics, an infinitesimal number is a non-zero quantity that is closer to 0 than any non-zero real number is. The word ''infinitesimal'' comes from a 17th-century Modern Latin coinage ''infinitesimus'', which originally referred to the " ...
limit, we can get the standard continuous stationary (time-independent) Schrödinger equation (
for
) here.
[J. Duda, ''Extended Maximal Entropy Random Walk''](_blank)
PhD Thesis, 2012.
See also
*
Principle of maximum entropy
The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...
*
Eigenvector centrality
In graph theory, eigenvector centrality (also called eigencentrality or prestige score) is a measure of the influence of a node in a connected network. Relative scores are assigned to all nodes in the network based on the concept that connections ...
*
Markov chain
In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally ...
*
Anderson localization
In condensed matter physics, Anderson localization (also known as strong localization) is the absence of diffusion of waves in a ''disordered'' medium. This phenomenon is named after the American physicist P. W. Anderson, who was the first to su ...
References
External links
* Gábor Simonyi
Y. Lin, Z. Zhang, "Mean first-passage time for maximal-entropy random walks in complex networks" Scientific Reports, 2014.
*
Electron Conductance Models Using Maximal Entropy Random WalksWolfram Demonstration Project
{{Stochastic processes
Network theory
Diffusion
Information theory
Quantum mechanics