HOME

TheInfoList



OR:

The partition function or configuration integral, as used in
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
,
information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
and
dynamical systems In mathematics, a dynamical system is a system in which a function describes the time dependence of a point in an ambient space. Examples include the mathematical models that describe the swinging of a clock pendulum, the flow of water in ...
, is a generalization of the definition of a
partition function in statistical mechanics In physics, a partition function describes the statistical properties of a system in thermodynamic equilibrium. Partition functions are functions of the thermodynamic state variables, such as the temperature and volume. Most of the aggregate ...
. It is a special case of a
normalizing constant The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics. The normalizing constant is used to reduce any probability function to a probability density function with total probability of one. ...
in probability theory, for the
Boltzmann distribution In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability ...
. The partition function occurs in many problems of probability theory because, in situations where there is a natural symmetry, its associated
probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more g ...
, the Gibbs measure, has the
Markov property In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov prop ...
. This means that the partition function occurs not only in physical systems with translation symmetry, but also in such varied settings as neural networks (the
Hopfield network A Hopfield network (or Ising model of a neural network or Ising–Lenz–Little model) is a form of recurrent artificial neural network and a type of spin glass system popularised by John Hopfield in 1982 as described earlier by Little in 1974 b ...
), and applications such as
genomics Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
,
corpus linguistics Corpus linguistics is the study of a language as that language is expressed in its text corpus (plural ''corpora''), its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora ...
and
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machine A machine is a physical system using Power (physics), power to apply Force, forces and control Motion, moveme ...
, which employ Markov networks, and
Markov logic network A Markov logic network (MLN) is a probabilistic logic which applies the ideas of a Markov network to first-order logic, enabling uncertain inference. Markov logic networks generalize first-order logic, in the sense that, in a certain limit, all ...
s. The Gibbs measure is also the unique measure that has the property of maximizing the
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...
for a fixed expectation value of the energy; this underlies the appearance of the partition function in maximum entropy methods and the algorithms derived therefrom. The partition function ties together many different concepts, and thus offers a general framework in which many different kinds of quantities may be calculated. In particular, it shows how to calculate expectation values and
Green's function In mathematics, a Green's function is the impulse response of an inhomogeneous linear differential operator defined on a domain with specified initial conditions or boundary conditions. This means that if \operatorname is the linear differenti ...
s, forming a bridge to Fredholm theory. It also provides a natural setting for the information geometry approach to information theory, where the Fisher information metric can be understood to be a
correlation function A correlation function is a function that gives the statistical correlation between random variables, contingent on the spatial or temporal distance between those variables. If one considers the correlation function between random variables re ...
derived from the partition function; it happens to define a
Riemannian manifold In differential geometry, a Riemannian manifold or Riemannian space , so called after the German mathematician Bernhard Riemann, is a real, smooth manifold ''M'' equipped with a positive-definite inner product ''g'p'' on the tangent spac ...
. When the setting for random variables is on
complex projective space In mathematics, complex projective space is the projective space with respect to the field of complex numbers. By analogy, whereas the points of a real projective space label the lines through the origin of a real Euclidean space, the points of ...
or
projective Hilbert space In mathematics and the foundations of quantum mechanics, the projective Hilbert space P(H) of a complex Hilbert space H is the set of equivalence classes of non-zero vectors v in H, for the relation \sim on H given by :w \sim v if and only if v ...
, geometrized with the Fubini–Study metric, the theory of
quantum mechanics Quantum mechanics is a fundamental theory in physics that provides a description of the physical properties of nature at the scale of atoms and subatomic particles. It is the foundation of all quantum physics including quantum chemistry, q ...
and more generally
quantum field theory In theoretical physics, quantum field theory (QFT) is a theoretical framework that combines classical field theory, special relativity, and quantum mechanics. QFT is used in particle physics to construct physical models of subatomic particles a ...
results. In these theories, the partition function is heavily exploited in the
path integral formulation The path integral formulation is a description in quantum mechanics that generalizes the action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional ...
, with great success, leading to many formulas nearly identical to those reviewed here. However, because the underlying measure space is complex-valued, as opposed to the real-valued
simplex In geometry, a simplex (plural: simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex is so-named because it represents the simplest possible polytope in any given dimension ...
of probability theory, an extra factor of ''i'' appears in many formulas. Tracking this factor is troublesome, and is not done here. This article focuses primarily on classical probability theory, where the sum of probabilities total to one.


Definition

Given a set of
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
s X_i taking on values x_i, and some sort of potential function or Hamiltonian H(x_1,x_2,\dots), the partition function is defined as :Z(\beta) = \sum_ \exp \left(-\beta H(x_1,x_2,\dots) \right) The function ''H'' is understood to be a real-valued function on the space of states \, while \beta is a real-valued free parameter (conventionally, the inverse temperature). The sum over the x_i is understood to be a sum over all possible values that each of the random variables X_i may take. Thus, the sum is to be replaced by an
integral In mathematics, an integral assigns numbers to functions in a way that describes displacement, area, volume, and other concepts that arise by combining infinitesimal data. The process of finding integrals is called integration. Along with ...
when the X_i are continuous, rather than discrete. Thus, one writes :Z(\beta) = \int \exp \left(-\beta H(x_1,x_2,\dots) \right) \, dx_1 \, dx_2 \cdots for the case of continuously-varying X_i. When ''H'' is an
observable In physics, an observable is a physical quantity that can be measured. Examples include position and momentum. In systems governed by classical mechanics, it is a real-valued "function" on the set of all possible system states. In quantum phys ...
, such as a finite-dimensional
matrix Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** '' The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchi ...
or an infinite-dimensional
Hilbert space In mathematics, Hilbert spaces (named after David Hilbert) allow generalizing the methods of linear algebra and calculus from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise natu ...
operator Operator may refer to: Mathematics * A symbol indicating a mathematical operation * Logical operator or logical connective in mathematical logic * Operator (mathematics), mapping that acts on elements of a space to produce elements of another ...
or element of a
C-star algebra In mathematics, specifically in functional analysis, a C∗-algebra (pronounced "C-star") is a Banach algebra together with an involution satisfying the properties of the adjoint. A particular case is that of a complex algebra ''A'' of cont ...
, it is common to express the summation as a trace, so that :Z(\beta) = \operatorname\left(\exp\left(-\beta H\right)\right) When ''H'' is infinite-dimensional, then, for the above notation to be valid, the argument must be
trace class In mathematics, specifically functional analysis, a trace-class operator is a linear operator for which a trace may be defined, such that the trace is a finite number independent of the choice of basis used to compute the trace. This trace of trace- ...
, that is, of a form such that the summation exists and is bounded. The number of variables X_i need not be
countable In mathematics, a set is countable if either it is finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function from it into the natural number ...
, in which case the sums are to be replaced by functional integrals. Although there are many notations for functional integrals, a common one would be :Z = \int \mathcal \varphi \exp \left(- \beta H varphi\right) Such is the case for the partition function in quantum field theory. A common, useful modification to the partition function is to introduce auxiliary functions. This allows, for example, the partition function to be used as a
generating function In mathematics, a generating function is a way of encoding an infinite sequence of numbers () by treating them as the coefficients of a formal power series. This series is called the generating function of the sequence. Unlike an ordinary ser ...
for
correlation function A correlation function is a function that gives the statistical correlation between random variables, contingent on the spatial or temporal distance between those variables. If one considers the correlation function between random variables re ...
s. This is discussed in greater detail below.


The parameter β

The role or meaning of the parameter \beta can be understood in a variety of different ways. In classical thermodynamics, it is an inverse temperature. More generally, one would say that it is the variable that is conjugate to some (arbitrary) function H of the random variables X. The word ''conjugate'' here is used in the sense of conjugate
generalized coordinates In analytical mechanics, generalized coordinates are a set of parameters used to represent the state of a system in a configuration space. These parameters must uniquely define the configuration of the system relative to a reference state.,p. 39 ...
in
Lagrangian mechanics In physics, Lagrangian mechanics is a formulation of classical mechanics founded on the stationary-action principle (also known as the principle of least action). It was introduced by the Italian-French mathematician and astronomer Joseph-Lou ...
, thus, properly \beta is a
Lagrange multiplier In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied ...
. It is not uncommonly called the generalized force. All of these concepts have in common the idea that one value is meant to be kept fixed, as others, interconnected in some complicated way, are allowed to vary. In the current case, the value to be kept fixed is the expectation value of H, even as many different
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
s can give rise to exactly this same (fixed) value. For the general case, one considers a set of functions \ that each depend on the random variables X_i. These functions are chosen because one wants to hold their expectation values constant, for one reason or another. To constrain the expectation values in this way, one applies the method of
Lagrange multiplier In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied ...
s. In the general case, maximum entropy methods illustrate the manner in which this is done. Some specific examples are in order. In basic thermodynamics problems, when using the
canonical ensemble In statistical mechanics, a canonical ensemble is the statistical ensemble that represents the possible states of a mechanical system in thermal equilibrium with a heat bath at a fixed temperature. The system can exchange energy with the hea ...
, the use of just one parameter \beta reflects the fact that there is only one expectation value that must be held constant: the free energy (due to
conservation of energy In physics and chemistry, the law of conservation of energy states that the total energy of an isolated system remains constant; it is said to be ''conserved'' over time. This law, first proposed and tested by Émilie du Châtelet, means tha ...
). For chemistry problems involving chemical reactions, the grand canonical ensemble provides the appropriate foundation, and there are two Lagrange multipliers. One is to hold the energy constant, and another, the
fugacity In chemical thermodynamics, the fugacity of a real gas is an effective partial pressure which replaces the mechanical partial pressure in an accurate computation of the chemical equilibrium constant. It is equal to the pressure of an ideal gas ...
, is to hold the particle count constant (as chemical reactions involve the recombination of a fixed number of atoms). For the general case, one has :Z(\beta) = \sum_ \exp \left(-\sum_k\beta_k H_k(x_i) \right) with \beta=(\beta_1, \beta_2,\cdots) a point in a space. For a collection of observables H_k, one would write :Z(\beta) = \operatorname\left ,\exp \left(-\sum_k\beta_k H_k\right)\right/math> As before, it is presumed that the argument of tr is
trace class In mathematics, specifically functional analysis, a trace-class operator is a linear operator for which a trace may be defined, such that the trace is a finite number independent of the choice of basis used to compute the trace. This trace of trace- ...
. The corresponding Gibbs measure then provides a probability distribution such that the expectation value of each H_k is a fixed value. More precisely, one has :\frac \left(- \log Z \right) = \langle H_k\rangle = \mathrm\left _k\right/math> with the angle brackets \langle H_k \rangle denoting the expected value of H_k, and \mathrm ;/math> being a common alternative notation. A precise definition of this expectation value is given below. Although the value of \beta is commonly taken to be real, it need not be, in general; this is discussed in the section Normalization below. The values of \beta can be understood to be the coordinates of points in a space; this space is in fact a
manifold In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a ...
, as sketched below. The study of these spaces as manifolds constitutes the field of information geometry.


Symmetry

The potential function itself commonly takes the form of a sum: :H(x_1,x_2,\dots) = \sum_s V(s)\, where the sum over ''s'' is a sum over some subset of the
power set In mathematics, the power set (or powerset) of a set is the set of all subsets of , including the empty set and itself. In axiomatic set theory (as developed, for example, in the ZFC axioms), the existence of the power set of any set is ...
''P''(''X'') of the set X=\lbrace x_1,x_2,\dots \rbrace. For example, in statistical mechanics, such as the
Ising model The Ising model () (or Lenz-Ising model or Ising-Lenz model), named after the physicists Ernst Ising and Wilhelm Lenz, is a mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that represent ...
, the sum is over pairs of nearest neighbors. In probability theory, such as
Markov networks In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to b ...
, the sum might be over the cliques of a graph; so, for the Ising model and other lattice models, the maximal cliques are edges. The fact that the potential function can be written as a sum usually reflects the fact that it is invariant under the action of a group symmetry, such as
translational invariance In geometry, to translate a geometric figure is to move it from one place to another without rotating it. A translation "slides" a thing by . In physics and mathematics, continuous translational symmetry is the invariance of a system of equatio ...
. Such symmetries can be discrete or continuous; they materialize in the
correlation function A correlation function is a function that gives the statistical correlation between random variables, contingent on the spatial or temporal distance between those variables. If one considers the correlation function between random variables re ...
s for the random variables (discussed below). Thus a symmetry in the Hamiltonian becomes a symmetry of the correlation function (and vice versa). This symmetry has a critically important interpretation in probability theory: it implies that the Gibbs measure has the
Markov property In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov prop ...
; that is, it is independent of the random variables in a certain way, or, equivalently, the measure is identical on the
equivalence class In mathematics, when the elements of some set S have a notion of equivalence (formalized as an equivalence relation), then one may naturally split the set S into equivalence classes. These equivalence classes are constructed so that elements ...
es of the symmetry. This leads to the widespread appearance of the partition function in problems with the Markov property, such as
Hopfield network A Hopfield network (or Ising model of a neural network or Ising–Lenz–Little model) is a form of recurrent artificial neural network and a type of spin glass system popularised by John Hopfield in 1982 as described earlier by Little in 1974 b ...
s.


As a measure

The value of the expression :\exp \left(-\beta H(x_1,x_2,\dots) \right) can be interpreted as a likelihood that a specific
configuration Configuration or configurations may refer to: Computing * Computer configuration or system configuration * Configuration file, a software file used to configure the initial settings for a computer program * Configurator, also known as choice boar ...
of values (x_1,x_2,\dots) occurs in the system. Thus, given a specific configuration (x_1,x_2,\dots), :P(x_1,x_2,\dots) = \frac \exp \left(-\beta H(x_1,x_2,\dots) \right) is the
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
of the configuration (x_1,x_2,\dots) occurring in the system, which is now properly normalized so that 0\le P(x_1,x_2,\dots)\le 1, and such that the sum over all configurations totals to one. As such, the partition function can be understood to provide a measure (a
probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more g ...
) on the
probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
; formally, it is called the Gibbs measure. It generalizes the narrower concepts of the grand canonical ensemble and
canonical ensemble In statistical mechanics, a canonical ensemble is the statistical ensemble that represents the possible states of a mechanical system in thermal equilibrium with a heat bath at a fixed temperature. The system can exchange energy with the hea ...
in statistical mechanics. There exists at least one configuration (x_1,x_2,\dots) for which the probability is maximized; this configuration is conventionally called the ground state. If the configuration is unique, the ground state is said to be non-degenerate, and the system is said to be
ergodic In mathematics, ergodicity expresses the idea that a point of a moving system, either a dynamical system or a stochastic process, will eventually visit all parts of the space that the system moves in, in a uniform and random sense. This implies t ...
; otherwise the ground state is degenerate. The ground state may or may not commute with the generators of the symmetry; if commutes, it is said to be an invariant measure. When it does not commute, the symmetry is said to be
spontaneously broken Spontaneous symmetry breaking is a spontaneous process of symmetry breaking, by which a physical system in a symmetric state spontaneously ends up in an asymmetric state. In particular, it can describe systems where the equations of motion or the ...
. Conditions under which a ground state exists and is unique are given by the
Karush–Kuhn–Tucker conditions In mathematical optimization, the Karush–Kuhn–Tucker (KKT) conditions, also known as the Kuhn–Tucker conditions, are first derivative tests (sometimes called first-order necessary conditions) for a solution in nonlinear programming to be ...
; these conditions are commonly used to justify the use of the Gibbs measure in maximum-entropy problems.


Normalization

The values taken by \beta depend on the mathematical space over which the random field varies. Thus, real-valued random fields take values on a
simplex In geometry, a simplex (plural: simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex is so-named because it represents the simplest possible polytope in any given dimension ...
: this is the geometrical way of saying that the sum of probabilities must total to one. For quantum mechanics, the random variables range over
complex projective space In mathematics, complex projective space is the projective space with respect to the field of complex numbers. By analogy, whereas the points of a real projective space label the lines through the origin of a real Euclidean space, the points of ...
(or complex-valued
projective Hilbert space In mathematics and the foundations of quantum mechanics, the projective Hilbert space P(H) of a complex Hilbert space H is the set of equivalence classes of non-zero vectors v in H, for the relation \sim on H given by :w \sim v if and only if v ...
), where the random variables are interpreted as
probability amplitude In quantum mechanics, a probability amplitude is a complex number used for describing the behaviour of systems. The modulus squared of this quantity represents a probability density. Probability amplitudes provide a relationship between the qu ...
s. The emphasis here is on the word ''projective'', as the amplitudes are still normalized to one. The normalization for the potential function is the
Jacobian In mathematics, a Jacobian, named for Carl Gustav Jacob Jacobi, may refer to: *Jacobian matrix and determinant *Jacobian elliptic functions *Jacobian variety *Intermediate Jacobian In mathematics, the intermediate Jacobian of a compact Kähler m ...
for the appropriate mathematical space: it is 1 for ordinary probabilities, and ''i'' for Hilbert space; thus, in
quantum field theory In theoretical physics, quantum field theory (QFT) is a theoretical framework that combines classical field theory, special relativity, and quantum mechanics. QFT is used in particle physics to construct physical models of subatomic particles a ...
, one sees it H in the exponential, rather than \beta H. The partition function is very heavily exploited in the
path integral formulation The path integral formulation is a description in quantum mechanics that generalizes the action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional ...
of quantum field theory, to great effect. The theory there is very nearly identical to that presented here, aside from this difference, and the fact that it is usually formulated on four-dimensional space-time, rather than in a general way.


Expectation values

The partition function is commonly used as a
probability-generating function In probability theory, the probability generating function of a discrete random variable is a power series representation (the generating function) of the probability mass function of the random variable. Probability generating functions are oft ...
for expectation values of various functions of the random variables. So, for example, taking \beta as an adjustable parameter, then the derivative of \log(Z(\beta)) with respect to \beta :\mathbf = \langle H \rangle = -\frac gives the average (expectation value) of ''H''. In physics, this would be called the average
energy In physics, energy (from Ancient Greek: ἐνέργεια, ''enérgeia'', “activity”) is the quantitative property that is transferred to a body or to a physical system, recognizable in the performance of work and in the form of hea ...
of the system. Given the definition of the probability measure above, the expectation value of any function ''f'' of the random variables ''X'' may now be written as expected: so, for discrete-valued ''X'', one writes :\begin \langle f\rangle & = \sum_ f(x_1,x_2,\dots) P(x_1,x_2,\dots) \\ & = \frac \sum_ f(x_1,x_2,\dots) \exp \left(-\beta H(x_1,x_2,\dots) \right) \end The above notation is strictly correct for a finite number of discrete random variables, but should be seen to be somewhat 'informal' for continuous variables; properly, the summations above should be replaced with the notations of the underlying sigma algebra used to define a
probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
. That said, the identities continue to hold, when properly formulated on a
measure space A measure space is a basic object of measure theory, a branch of mathematics that studies generalized notions of volumes. It contains an underlying set, the subsets of this set that are feasible for measuring (the -algebra) and the method that ...
. Thus, for example, the
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...
is given by :\begin S & = -k_B \langle\ln P\rangle \\ & = -k_B\sum_ P(x_1,x_2,\dots) \ln P(x_1,x_2,\dots) \\ & = k_B(\beta \langle H\rangle + \log Z(\beta)) \end The Gibbs measure is the unique statistical distribution that maximizes the entropy for a fixed expectation value of the energy; this underlies its use in maximum entropy methods.


Information geometry

The points \beta can be understood to form a space, and specifically, a
manifold In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a ...
. Thus, it is reasonable to ask about the structure of this manifold; this is the task of information geometry. Multiple derivatives with regard to the Lagrange multipliers gives rise to a positive semi-definite
covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements o ...
:g_(\beta) = \frac \left(-\log Z(\beta)\right) = \langle \left(H_i-\langle H_i\rangle\right)\left( H_j-\langle H_j\rangle\right)\rangle This matrix is positive semi-definite, and may be interpreted as a
metric tensor In the mathematical field of differential geometry, a metric tensor (or simply metric) is an additional structure on a manifold (such as a surface) that allows defining distances and angles, just as the inner product on a Euclidean space allo ...
, specifically, a
Riemannian metric In differential geometry, a Riemannian manifold or Riemannian space , so called after the German mathematician Bernhard Riemann, is a real, smooth manifold ''M'' equipped with a positive-definite inner product ''g'p'' on the tangent spac ...
. Equipping the space of lagrange multipliers with a metric in this way turns it into a
Riemannian manifold In differential geometry, a Riemannian manifold or Riemannian space , so called after the German mathematician Bernhard Riemann, is a real, smooth manifold ''M'' equipped with a positive-definite inner product ''g'p'' on the tangent spac ...
. The study of such manifolds is referred to as information geometry; the metric above is the Fisher information metric. Here, \beta serves as a coordinate on the manifold. It is interesting to compare the above definition to the simpler
Fisher information In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
, from which it is inspired. That the above defines the Fisher information metric can be readily seen by explicitly substituting for the expectation value: :\begin g_(\beta) & = \langle \left(H_i-\langle H_i\rangle\right)\left( H_j-\langle H_j\rangle\right)\rangle \\ & = \sum_ P(x) \left(H_i-\langle H_i\rangle\right)\left( H_j-\langle H_j\rangle\right) \\ & = \sum_ P(x) \left(H_i + \frac\right) \left(H_j + \frac\right) \\ & = \sum_ P(x) \frac \frac \\ \end where we've written P(x) for P(x_1,x_2,\dots) and the summation is understood to be over all values of all random variables X_k. For continuous-valued random variables, the summations are replaced by integrals, of course. Curiously, the Fisher information metric can also be understood as the flat-space
Euclidean metric In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefore o ...
, after appropriate change of variables, as described in the main article on it. When the \beta are complex-valued, the resulting metric is the Fubini–Study metric. When written in terms of mixed states, instead of
pure state In quantum physics, a quantum state is a mathematical entity that provides a probability distribution for the outcomes of each possible measurement on a system. Knowledge of the quantum state together with the rules for the system's evolution in ...
s, it is known as the Bures metric.


Correlation functions

By introducing artificial auxiliary functions J_k into the partition function, it can then be used to obtain the expectation value of the random variables. Thus, for example, by writing :\begin Z(\beta,J) & = Z(\beta,J_1,J_2,\dots) \\ & = \sum_ \exp \left(-\beta H(x_1,x_2,\dots) + \sum_n J_n x_n \right) \end one then has :\mathbf _k= \langle x_k \rangle = \left. \frac \log Z(\beta,J)\_ as the expectation value of x_k. In the
path integral formulation The path integral formulation is a description in quantum mechanics that generalizes the action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional ...
of
quantum field theory In theoretical physics, quantum field theory (QFT) is a theoretical framework that combines classical field theory, special relativity, and quantum mechanics. QFT is used in particle physics to construct physical models of subatomic particles a ...
, these auxiliary functions are commonly referred to as source fields. Multiple differentiations lead to the connected correlation functions of the random variables. Thus the correlation function C(x_j,x_k) between variables x_j and x_k is given by: :C(x_j,x_k) = \left. \frac \frac \log Z(\beta,J)\_


Gaussian integrals

For the case where ''H'' can be written as a
quadratic form In mathematics, a quadratic form is a polynomial with terms all of degree two ("form" is another name for a homogeneous polynomial). For example, :4x^2 + 2xy - 3y^2 is a quadratic form in the variables and . The coefficients usually belong to ...
involving a differential operator, that is, as :H = \frac \sum_n x_n D x_n then partition function can be understood to be a sum or
integral In mathematics, an integral assigns numbers to functions in a way that describes displacement, area, volume, and other concepts that arise by combining infinitesimal data. The process of finding integrals is called integration. Along with ...
over Gaussians. The correlation function C(x_j,x_k) can be understood to be the
Green's function In mathematics, a Green's function is the impulse response of an inhomogeneous linear differential operator defined on a domain with specified initial conditions or boundary conditions. This means that if \operatorname is the linear differenti ...
for the differential operator (and generally giving rise to Fredholm theory). In the quantum field theory setting, such functions are referred to as
propagator In quantum mechanics and quantum field theory, the propagator is a function that specifies the probability amplitude for a particle to travel from one place to another in a given period of time, or to travel with a certain energy and momentum. ...
s; higher order correlators are called n-point functions; working with them defines the effective action of a theory. When the random variables are anti-commuting Grassmann numbers, then the partition function can be expressed as a determinant of the operator ''D''. This is done by writing it as a
Berezin integral In mathematical physics, the Berezin integral, named after Felix Berezin, (also known as Grassmann integral, after Hermann Grassmann), is a way to define integration for functions of Grassmann variables (elements of the exterior algebra). It is ...
(also called Grassmann integral).


General properties

Partition functions are used to discuss critical scaling, universality and are subject to the
renormalization group In theoretical physics, the term renormalization group (RG) refers to a formal apparatus that allows systematic investigation of the changes of a physical system as viewed at different scales. In particle physics, it reflects the changes in the ...
.


See also

*
Exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
*
Partition function (statistical mechanics) In physics, a partition function describes the statistical properties of a system in thermodynamic equilibrium. Partition functions are functions of the thermodynamic state variables, such as the temperature and volume. Most of the aggregat ...
* Partition problem *
Markov random field In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to b ...


References

{{reflist Entropy and information