HOME

TheInfoList



OR:

The partition function or configuration integral, as used in
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
,
information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
and
dynamical systems In mathematics, a dynamical system is a system in which a function describes the time dependence of a point in an ambient space. Examples include the mathematical models that describe the swinging of a clock pendulum, the flow of water in a ...
, is a generalization of the definition of a
partition function in statistical mechanics In physics, a partition function describes the statistical properties of a system in thermodynamic equilibrium. Partition functions are functions of the thermodynamic state variables, such as the temperature and volume. Most of the aggre ...
. It is a special case of a
normalizing constant The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics. The normalizing constant is used to reduce any probability function to a probability density function with total probability of one. ...
in probability theory, for the Boltzmann distribution. The partition function occurs in many problems of probability theory because, in situations where there is a natural symmetry, its associated
probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more ge ...
, the Gibbs measure, has the
Markov property In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov propert ...
. This means that the partition function occurs not only in physical systems with translation symmetry, but also in such varied settings as neural networks (the
Hopfield network A Hopfield network (or Ising model of a neural network or Ising–Lenz–Little model) is a form of recurrent artificial neural network and a type of spin glass system popularised by John Hopfield in 1982 as described earlier by Little in 1974 b ...
), and applications such as
genomics Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
,
corpus linguistics Corpus linguistics is the study of a language as that language is expressed in its text corpus (plural ''corpora''), its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora ...
and
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech ...
, which employ
Markov network In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said ...
s, and
Markov logic network A Markov logic network (MLN) is a probabilistic logic which applies the ideas of a Markov network to first-order logic, enabling uncertain inference. Markov logic networks generalize first-order logic, in the sense that, in a certain limit, all u ...
s. The Gibbs measure is also the unique measure that has the property of maximizing the
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...
for a fixed expectation value of the energy; this underlies the appearance of the partition function in
maximum entropy method The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...
s and the algorithms derived therefrom. The partition function ties together many different concepts, and thus offers a general framework in which many different kinds of quantities may be calculated. In particular, it shows how to calculate
expectation value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
s and
Green's function In mathematics, a Green's function is the impulse response of an inhomogeneous linear differential operator defined on a domain with specified initial conditions or boundary conditions. This means that if \operatorname is the linear differenti ...
s, forming a bridge to
Fredholm theory In mathematics, Fredholm theory is a theory of integral equations. In the narrowest sense, Fredholm theory concerns itself with the solution of the Fredholm integral equation. In a broader sense, the abstract structure of Fredholm's theory is giv ...
. It also provides a natural setting for the
information geometry Information geometry is an interdisciplinary field that applies the techniques of differential geometry to study probability theory and statistics. It studies statistical manifolds, which are Riemannian manifolds whose points correspond to pro ...
approach to information theory, where the
Fisher information metric In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, ''i.e.'', a smooth manifold whose points are probability measures defined on a common probability spa ...
can be understood to be a
correlation function A correlation function is a function that gives the statistical correlation between random variables, contingent on the spatial or temporal distance between those variables. If one considers the correlation function between random variables r ...
derived from the partition function; it happens to define a
Riemannian manifold In differential geometry, a Riemannian manifold or Riemannian space , so called after the German mathematician Bernhard Riemann, is a real, smooth manifold ''M'' equipped with a positive-definite inner product ''g'p'' on the tangent space ...
. When the setting for random variables is on
complex projective space In mathematics, complex projective space is the projective space with respect to the field of complex numbers. By analogy, whereas the points of a real projective space label the lines through the origin of a real Euclidean space, the points of a ...
or
projective Hilbert space In mathematics and the foundations of quantum mechanics, the projective Hilbert space P(H) of a complex Hilbert space H is the set of equivalence classes of non-zero vectors v in H, for the relation \sim on H given by :w \sim v if and only if v = \ ...
, geometrized with the
Fubini–Study metric In mathematics, the Fubini–Study metric is a Kähler metric on projective Hilbert space, that is, on a complex projective space CP''n'' endowed with a Hermitian form. This metric was originally described in 1904 and 1905 by Guido Fubini and ...
, the theory of
quantum mechanics Quantum mechanics is a fundamental theory in physics that provides a description of the physical properties of nature at the scale of atoms and subatomic particles. It is the foundation of all quantum physics including quantum chemistry, ...
and more generally
quantum field theory In theoretical physics, quantum field theory (QFT) is a theoretical framework that combines classical field theory, special relativity, and quantum mechanics. QFT is used in particle physics to construct physical models of subatomic particles and ...
results. In these theories, the partition function is heavily exploited in the
path integral formulation The path integral formulation is a description in quantum mechanics that generalizes the action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional i ...
, with great success, leading to many formulas nearly identical to those reviewed here. However, because the underlying measure space is complex-valued, as opposed to the real-valued
simplex In geometry, a simplex (plural: simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex is so-named because it represents the simplest possible polytope in any given dimension. ...
of probability theory, an extra factor of ''i'' appears in many formulas. Tracking this factor is troublesome, and is not done here. This article focuses primarily on classical probability theory, where the sum of probabilities total to one.


Definition

Given a set of
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
s X_i taking on values x_i, and some sort of potential function or Hamiltonian H(x_1,x_2,\dots), the partition function is defined as :Z(\beta) = \sum_ \exp \left(-\beta H(x_1,x_2,\dots) \right) The function ''H'' is understood to be a real-valued function on the space of states \, while \beta is a real-valued free parameter (conventionally, the inverse temperature). The sum over the x_i is understood to be a sum over all possible values that each of the random variables X_i may take. Thus, the sum is to be replaced by an
integral In mathematics, an integral assigns numbers to functions in a way that describes displacement, area, volume, and other concepts that arise by combining infinitesimal data. The process of finding integrals is called integration. Along with ...
when the X_i are continuous, rather than discrete. Thus, one writes :Z(\beta) = \int \exp \left(-\beta H(x_1,x_2,\dots) \right) \, dx_1 \, dx_2 \cdots for the case of continuously-varying X_i. When ''H'' is an
observable In physics, an observable is a physical quantity that can be measured. Examples include position and momentum. In systems governed by classical mechanics, it is a real-valued "function" on the set of all possible system states. In quantum phy ...
, such as a finite-dimensional
matrix Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** '' The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchi ...
or an infinite-dimensional
Hilbert space In mathematics, Hilbert spaces (named after David Hilbert) allow generalizing the methods of linear algebra and calculus from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise natural ...
operator or element of a
C-star algebra In mathematics, specifically in functional analysis, a C∗-algebra (pronounced "C-star") is a Banach algebra together with an involution satisfying the properties of the adjoint. A particular case is that of a complex algebra ''A'' of continuous ...
, it is common to express the summation as a trace, so that :Z(\beta) = \operatorname\left(\exp\left(-\beta H\right)\right) When ''H'' is infinite-dimensional, then, for the above notation to be valid, the argument must be
trace class In mathematics, specifically functional analysis, a trace-class operator is a linear operator for which a trace may be defined, such that the trace is a finite number independent of the choice of basis used to compute the trace. This trace of trace ...
, that is, of a form such that the summation exists and is bounded. The number of variables X_i need not be
countable In mathematics, a set is countable if either it is finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function from it into the natural numbers ...
, in which case the sums are to be replaced by functional integrals. Although there are many notations for functional integrals, a common one would be :Z = \int \mathcal \varphi \exp \left(- \beta H varphi\right) Such is the case for the
partition function in quantum field theory In quantum field theory, partition functions are generating function, generating functionals for correlation function (quantum field theory), correlation functions, making them key objects of study in the path integral formulation, path integral ...
. A common, useful modification to the partition function is to introduce auxiliary functions. This allows, for example, the partition function to be used as a
generating function In mathematics, a generating function is a way of encoding an infinite sequence of numbers () by treating them as the coefficients of a formal power series. This series is called the generating function of the sequence. Unlike an ordinary serie ...
for
correlation function A correlation function is a function that gives the statistical correlation between random variables, contingent on the spatial or temporal distance between those variables. If one considers the correlation function between random variables r ...
s. This is discussed in greater detail below.


The parameter β

The role or meaning of the parameter \beta can be understood in a variety of different ways. In classical thermodynamics, it is an inverse temperature. More generally, one would say that it is the variable that is conjugate to some (arbitrary) function H of the random variables X. The word ''conjugate'' here is used in the sense of conjugate
generalized coordinates In analytical mechanics, generalized coordinates are a set of parameters used to represent the state of a system in a configuration space. These parameters must uniquely define the configuration of the system relative to a reference state.,p. 39 ...
in
Lagrangian mechanics In physics, Lagrangian mechanics is a formulation of classical mechanics founded on the stationary-action principle (also known as the principle of least action). It was introduced by the Italian-French mathematician and astronomer Joseph- ...
, thus, properly \beta is a
Lagrange multiplier In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied e ...
. It is not uncommonly called the
generalized force Generalized forces find use in Lagrangian mechanics, where they play a role conjugate to generalized coordinates. They are obtained from the applied forces, Fi, i=1,..., n, acting on a system that has its configuration defined in terms of generaliz ...
. All of these concepts have in common the idea that one value is meant to be kept fixed, as others, interconnected in some complicated way, are allowed to vary. In the current case, the value to be kept fixed is the
expectation value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
of H, even as many different
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
s can give rise to exactly this same (fixed) value. For the general case, one considers a set of functions \ that each depend on the random variables X_i. These functions are chosen because one wants to hold their expectation values constant, for one reason or another. To constrain the expectation values in this way, one applies the method of
Lagrange multiplier In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied e ...
s. In the general case,
maximum entropy method The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...
s illustrate the manner in which this is done. Some specific examples are in order. In basic thermodynamics problems, when using the
canonical ensemble In statistical mechanics, a canonical ensemble is the statistical ensemble that represents the possible states of a mechanical system in thermal equilibrium with a heat bath at a fixed temperature. The system can exchange energy with the hea ...
, the use of just one parameter \beta reflects the fact that there is only one expectation value that must be held constant: the free energy (due to
conservation of energy In physics and chemistry, the law of conservation of energy states that the total energy of an isolated system remains constant; it is said to be ''conserved'' over time. This law, first proposed and tested by Émilie du Châtelet, means tha ...
). For chemistry problems involving chemical reactions, the
grand canonical ensemble In statistical mechanics, the grand canonical ensemble (also known as the macrocanonical ensemble) is the statistical ensemble that is used to represent the possible states of a mechanical system of particles that are in thermodynamic equilibriu ...
provides the appropriate foundation, and there are two Lagrange multipliers. One is to hold the energy constant, and another, the
fugacity In chemical thermodynamics, the fugacity of a real gas is an effective partial pressure which replaces the mechanical partial pressure in an accurate computation of the chemical equilibrium constant. It is equal to the pressure of an ideal gas whic ...
, is to hold the particle count constant (as chemical reactions involve the recombination of a fixed number of atoms). For the general case, one has :Z(\beta) = \sum_ \exp \left(-\sum_k\beta_k H_k(x_i) \right) with \beta=(\beta_1, \beta_2,\cdots) a point in a space. For a collection of observables H_k, one would write :Z(\beta) = \operatorname\left ,\exp \left(-\sum_k\beta_k H_k\right)\right/math> As before, it is presumed that the argument of tr is
trace class In mathematics, specifically functional analysis, a trace-class operator is a linear operator for which a trace may be defined, such that the trace is a finite number independent of the choice of basis used to compute the trace. This trace of trace ...
. The corresponding Gibbs measure then provides a probability distribution such that the expectation value of each H_k is a fixed value. More precisely, one has :\frac \left(- \log Z \right) = \langle H_k\rangle = \mathrm\left _k\right/math> with the angle brackets \langle H_k \rangle denoting the expected value of H_k, and \mathrm ;/math> being a common alternative notation. A precise definition of this expectation value is given below. Although the value of \beta is commonly taken to be real, it need not be, in general; this is discussed in the section Normalization below. The values of \beta can be understood to be the coordinates of points in a space; this space is in fact a
manifold In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a n ...
, as sketched below. The study of these spaces as manifolds constitutes the field of
information geometry Information geometry is an interdisciplinary field that applies the techniques of differential geometry to study probability theory and statistics. It studies statistical manifolds, which are Riemannian manifolds whose points correspond to pro ...
.


Symmetry

The potential function itself commonly takes the form of a sum: :H(x_1,x_2,\dots) = \sum_s V(s)\, where the sum over ''s'' is a sum over some subset of the
power set In mathematics, the power set (or powerset) of a set is the set of all subsets of , including the empty set and itself. In axiomatic set theory (as developed, for example, in the ZFC axioms), the existence of the power set of any set is post ...
''P''(''X'') of the set X=\lbrace x_1,x_2,\dots \rbrace. For example, in
statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. It does not assume or postulate any natural laws, but explains the macroscopic b ...
, such as the
Ising model The Ising model () (or Lenz-Ising model or Ising-Lenz model), named after the physicists Ernst Ising and Wilhelm Lenz, is a mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that represent ...
, the sum is over pairs of nearest neighbors. In probability theory, such as
Markov networks In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said ...
, the sum might be over the
cliques A clique ( AusE, CanE, or ), in the social sciences, is a group of individuals who interact with one another and share similar interests. Interacting with cliques is part of normative social development regardless of gender, ethnicity, or popular ...
of a graph; so, for the Ising model and other
lattice models In mathematical physics, a lattice model is a mathematical model of a physical system that is defined on a lattice, as opposed to a continuum, such as the continuum of space or spacetime. Lattice models originally occurred in the context of co ...
, the maximal cliques are edges. The fact that the potential function can be written as a sum usually reflects the fact that it is invariant under the
action Action may refer to: * Action (narrative), a literary mode * Action fiction, a type of genre fiction * Action game, a genre of video game Film * Action film, a genre of film * ''Action'' (1921 film), a film by John Ford * ''Action'' (1980 fil ...
of a group symmetry, such as
translational invariance In geometry, to translate a geometric figure is to move it from one place to another without rotating it. A translation "slides" a thing by . In physics and mathematics, continuous translational symmetry is the invariance of a system of equa ...
. Such symmetries can be discrete or continuous; they materialize in the
correlation function A correlation function is a function that gives the statistical correlation between random variables, contingent on the spatial or temporal distance between those variables. If one considers the correlation function between random variables r ...
s for the random variables (discussed below). Thus a symmetry in the Hamiltonian becomes a symmetry of the correlation function (and vice versa). This symmetry has a critically important interpretation in probability theory: it implies that the Gibbs measure has the
Markov property In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov propert ...
; that is, it is independent of the random variables in a certain way, or, equivalently, the measure is identical on the
equivalence class In mathematics, when the elements of some set S have a notion of equivalence (formalized as an equivalence relation), then one may naturally split the set S into equivalence classes. These equivalence classes are constructed so that elements a ...
es of the symmetry. This leads to the widespread appearance of the partition function in problems with the Markov property, such as
Hopfield network A Hopfield network (or Ising model of a neural network or Ising–Lenz–Little model) is a form of recurrent artificial neural network and a type of spin glass system popularised by John Hopfield in 1982 as described earlier by Little in 1974 b ...
s.


As a measure

The value of the expression :\exp \left(-\beta H(x_1,x_2,\dots) \right) can be interpreted as a likelihood that a specific
configuration Configuration or configurations may refer to: Computing * Computer configuration or system configuration * Configuration file, a software file used to configure the initial settings for a computer program * Configurator, also known as choice bo ...
of values (x_1,x_2,\dots) occurs in the system. Thus, given a specific configuration (x_1,x_2,\dots), :P(x_1,x_2,\dots) = \frac \exp \left(-\beta H(x_1,x_2,\dots) \right) is the
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...
of the configuration (x_1,x_2,\dots) occurring in the system, which is now properly normalized so that 0\le P(x_1,x_2,\dots)\le 1, and such that the sum over all configurations totals to one. As such, the partition function can be understood to provide a measure (a
probability measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more ge ...
) on the
probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
; formally, it is called the Gibbs measure. It generalizes the narrower concepts of the
grand canonical ensemble In statistical mechanics, the grand canonical ensemble (also known as the macrocanonical ensemble) is the statistical ensemble that is used to represent the possible states of a mechanical system of particles that are in thermodynamic equilibriu ...
and
canonical ensemble In statistical mechanics, a canonical ensemble is the statistical ensemble that represents the possible states of a mechanical system in thermal equilibrium with a heat bath at a fixed temperature. The system can exchange energy with the hea ...
in statistical mechanics. There exists at least one configuration (x_1,x_2,\dots) for which the probability is maximized; this configuration is conventionally called the
ground state The ground state of a quantum-mechanical system is its stationary state of lowest energy; the energy of the ground state is known as the zero-point energy of the system. An excited state is any state with energy greater than the ground state. ...
. If the configuration is unique, the ground state is said to be non-degenerate, and the system is said to be
ergodic In mathematics, ergodicity expresses the idea that a point of a moving system, either a dynamical system or a stochastic process, will eventually visit all parts of the space that the system moves in, in a uniform and random sense. This implies tha ...
; otherwise the ground state is degenerate. The ground state may or may not commute with the generators of the symmetry; if commutes, it is said to be an
invariant measure In mathematics, an invariant measure is a measure that is preserved by some function. The function may be a geometric transformation. For examples, circular angle is invariant under rotation, hyperbolic angle is invariant under squeeze mapping ...
. When it does not commute, the symmetry is said to be
spontaneously broken Spontaneous symmetry breaking is a spontaneous process of symmetry breaking, by which a physical system in a symmetric state spontaneously ends up in an asymmetric state. In particular, it can describe systems where the equations of motion or the ...
. Conditions under which a ground state exists and is unique are given by the
Karush–Kuhn–Tucker conditions In mathematical optimization, the Karush–Kuhn–Tucker (KKT) conditions, also known as the Kuhn–Tucker conditions, are first derivative tests (sometimes called first-order necessary conditions) for a solution in nonlinear programming to be o ...
; these conditions are commonly used to justify the use of the Gibbs measure in maximum-entropy problems.


Normalization

The values taken by \beta depend on the mathematical space over which the random field varies. Thus, real-valued random fields take values on a
simplex In geometry, a simplex (plural: simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex is so-named because it represents the simplest possible polytope in any given dimension. ...
: this is the geometrical way of saying that the sum of probabilities must total to one. For quantum mechanics, the random variables range over
complex projective space In mathematics, complex projective space is the projective space with respect to the field of complex numbers. By analogy, whereas the points of a real projective space label the lines through the origin of a real Euclidean space, the points of a ...
(or complex-valued
projective Hilbert space In mathematics and the foundations of quantum mechanics, the projective Hilbert space P(H) of a complex Hilbert space H is the set of equivalence classes of non-zero vectors v in H, for the relation \sim on H given by :w \sim v if and only if v = \ ...
), where the random variables are interpreted as
probability amplitude In quantum mechanics, a probability amplitude is a complex number used for describing the behaviour of systems. The modulus squared of this quantity represents a probability density. Probability amplitudes provide a relationship between the qu ...
s. The emphasis here is on the word ''projective'', as the amplitudes are still normalized to one. The normalization for the potential function is the
Jacobian In mathematics, a Jacobian, named for Carl Gustav Jacob Jacobi, may refer to: * Jacobian matrix and determinant * Jacobian elliptic functions * Jacobian variety *Intermediate Jacobian In mathematics, the intermediate Jacobian of a compact Kähle ...
for the appropriate mathematical space: it is 1 for ordinary probabilities, and ''i'' for Hilbert space; thus, in
quantum field theory In theoretical physics, quantum field theory (QFT) is a theoretical framework that combines classical field theory, special relativity, and quantum mechanics. QFT is used in particle physics to construct physical models of subatomic particles and ...
, one sees it H in the exponential, rather than \beta H. The partition function is very heavily exploited in the
path integral formulation The path integral formulation is a description in quantum mechanics that generalizes the action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional i ...
of quantum field theory, to great effect. The theory there is very nearly identical to that presented here, aside from this difference, and the fact that it is usually formulated on four-dimensional space-time, rather than in a general way.


Expectation values

The partition function is commonly used as a probability-generating function for
expectation value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
s of various functions of the random variables. So, for example, taking \beta as an adjustable parameter, then the derivative of \log(Z(\beta)) with respect to \beta :\mathbf = \langle H \rangle = -\frac gives the average (expectation value) of ''H''. In physics, this would be called the average
energy In physics, energy (from Ancient Greek: ἐνέργεια, ''enérgeia'', “activity”) is the quantitative property that is transferred to a body or to a physical system, recognizable in the performance of work and in the form of ...
of the system. Given the definition of the probability measure above, the expectation value of any function ''f'' of the random variables ''X'' may now be written as expected: so, for discrete-valued ''X'', one writes :\begin \langle f\rangle & = \sum_ f(x_1,x_2,\dots) P(x_1,x_2,\dots) \\ & = \frac \sum_ f(x_1,x_2,\dots) \exp \left(-\beta H(x_1,x_2,\dots) \right) \end The above notation is strictly correct for a finite number of discrete random variables, but should be seen to be somewhat 'informal' for continuous variables; properly, the summations above should be replaced with the notations of the underlying
sigma algebra Sigma (; uppercase Σ, lowercase σ, lowercase in word-final position ς; grc-gre, σίγμα) is the eighteenth letter of the Greek alphabet. In the system of Greek numerals, it has a value of 200. In general mathematics, uppercase Σ is used a ...
used to define a
probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
. That said, the identities continue to hold, when properly formulated on a
measure space A measure space is a basic object of measure theory, a branch of mathematics that studies generalized notions of volumes. It contains an underlying set, the subsets of this set that are feasible for measuring (the -algebra) and the method that ...
. Thus, for example, the
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...
is given by :\begin S & = -k_B \langle\ln P\rangle \\ & = -k_B\sum_ P(x_1,x_2,\dots) \ln P(x_1,x_2,\dots) \\ & = k_B(\beta \langle H\rangle + \log Z(\beta)) \end The Gibbs measure is the unique statistical distribution that maximizes the entropy for a fixed expectation value of the energy; this underlies its use in
maximum entropy method The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...
s.


Information geometry

The points \beta can be understood to form a space, and specifically, a
manifold In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. More precisely, an n-dimensional manifold, or ''n-manifold'' for short, is a topological space with the property that each point has a n ...
. Thus, it is reasonable to ask about the structure of this manifold; this is the task of
information geometry Information geometry is an interdisciplinary field that applies the techniques of differential geometry to study probability theory and statistics. It studies statistical manifolds, which are Riemannian manifolds whose points correspond to pro ...
. Multiple derivatives with regard to the Lagrange multipliers gives rise to a positive semi-definite
covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
:g_(\beta) = \frac \left(-\log Z(\beta)\right) = \langle \left(H_i-\langle H_i\rangle\right)\left( H_j-\langle H_j\rangle\right)\rangle This matrix is positive semi-definite, and may be interpreted as a
metric tensor In the mathematical field of differential geometry, a metric tensor (or simply metric) is an additional structure on a manifold (such as a surface) that allows defining distances and angles, just as the inner product on a Euclidean space allow ...
, specifically, a
Riemannian metric In differential geometry, a Riemannian manifold or Riemannian space , so called after the German mathematician Bernhard Riemann, is a real, smooth manifold ''M'' equipped with a positive-definite inner product ''g'p'' on the tangent space '' ...
. Equipping the space of lagrange multipliers with a metric in this way turns it into a
Riemannian manifold In differential geometry, a Riemannian manifold or Riemannian space , so called after the German mathematician Bernhard Riemann, is a real, smooth manifold ''M'' equipped with a positive-definite inner product ''g'p'' on the tangent space ...
. The study of such manifolds is referred to as
information geometry Information geometry is an interdisciplinary field that applies the techniques of differential geometry to study probability theory and statistics. It studies statistical manifolds, which are Riemannian manifolds whose points correspond to pro ...
; the metric above is the
Fisher information metric In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, ''i.e.'', a smooth manifold whose points are probability measures defined on a common probability spa ...
. Here, \beta serves as a coordinate on the manifold. It is interesting to compare the above definition to the simpler
Fisher information In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
, from which it is inspired. That the above defines the Fisher information metric can be readily seen by explicitly substituting for the expectation value: :\begin g_(\beta) & = \langle \left(H_i-\langle H_i\rangle\right)\left( H_j-\langle H_j\rangle\right)\rangle \\ & = \sum_ P(x) \left(H_i-\langle H_i\rangle\right)\left( H_j-\langle H_j\rangle\right) \\ & = \sum_ P(x) \left(H_i + \frac\right) \left(H_j + \frac\right) \\ & = \sum_ P(x) \frac \frac \\ \end where we've written P(x) for P(x_1,x_2,\dots) and the summation is understood to be over all values of all random variables X_k. For continuous-valued random variables, the summations are replaced by integrals, of course. Curiously, the
Fisher information metric In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, ''i.e.'', a smooth manifold whose points are probability measures defined on a common probability spa ...
can also be understood as the flat-space
Euclidean metric In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefore occ ...
, after appropriate change of variables, as described in the main article on it. When the \beta are complex-valued, the resulting metric is the
Fubini–Study metric In mathematics, the Fubini–Study metric is a Kähler metric on projective Hilbert space, that is, on a complex projective space CP''n'' endowed with a Hermitian form. This metric was originally described in 1904 and 1905 by Guido Fubini and ...
. When written in terms of mixed states, instead of
pure state In quantum physics, a quantum state is a mathematical entity that provides a probability distribution for the outcomes of each possible measurement on a system. Knowledge of the quantum state together with the rules for the system's evolution in t ...
s, it is known as the
Bures metric In mathematics, in the area of quantum information geometry, the Bures metric (named after Donald Bures) or Helstrom metric (named after Carl W. Helstrom) defines an infinitesimal distance between density matrix operators defining quantum states. ...
.


Correlation functions

By introducing artificial auxiliary functions J_k into the partition function, it can then be used to obtain the expectation value of the random variables. Thus, for example, by writing :\begin Z(\beta,J) & = Z(\beta,J_1,J_2,\dots) \\ & = \sum_ \exp \left(-\beta H(x_1,x_2,\dots) + \sum_n J_n x_n \right) \end one then has :\mathbf _k= \langle x_k \rangle = \left. \frac \log Z(\beta,J)\_ as the expectation value of x_k. In the
path integral formulation The path integral formulation is a description in quantum mechanics that generalizes the action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional i ...
of
quantum field theory In theoretical physics, quantum field theory (QFT) is a theoretical framework that combines classical field theory, special relativity, and quantum mechanics. QFT is used in particle physics to construct physical models of subatomic particles and ...
, these auxiliary functions are commonly referred to as
source field In theoretical physics, a source field is a field J whose multiple : S_ = J\Phi appears in the action, multiplied by the original field \Phi. Consequently, the source field appears on the right-hand side of the equations of motion (usually second- ...
s. Multiple differentiations lead to the
connected correlation function In statistical mechanics, an Ursell function or connected correlation function, is a cumulant of a random variable. It can often be obtained by summing over connected Feynman diagrams (the sum over all Feynman diagrams gives the correlation function ...
s of the random variables. Thus the correlation function C(x_j,x_k) between variables x_j and x_k is given by: :C(x_j,x_k) = \left. \frac \frac \log Z(\beta,J)\_


Gaussian integrals

For the case where ''H'' can be written as a
quadratic form In mathematics, a quadratic form is a polynomial with terms all of degree two ("form" is another name for a homogeneous polynomial). For example, :4x^2 + 2xy - 3y^2 is a quadratic form in the variables and . The coefficients usually belong to ...
involving a
differential operator In mathematics, a differential operator is an operator defined as a function of the differentiation operator. It is helpful, as a matter of notation first, to consider differentiation as an abstract operation that accepts a function and retur ...
, that is, as :H = \frac \sum_n x_n D x_n then partition function can be understood to be a sum or
integral In mathematics, an integral assigns numbers to functions in a way that describes displacement, area, volume, and other concepts that arise by combining infinitesimal data. The process of finding integrals is called integration. Along with ...
over Gaussians. The correlation function C(x_j,x_k) can be understood to be the
Green's function In mathematics, a Green's function is the impulse response of an inhomogeneous linear differential operator defined on a domain with specified initial conditions or boundary conditions. This means that if \operatorname is the linear differenti ...
for the differential operator (and generally giving rise to
Fredholm theory In mathematics, Fredholm theory is a theory of integral equations. In the narrowest sense, Fredholm theory concerns itself with the solution of the Fredholm integral equation. In a broader sense, the abstract structure of Fredholm's theory is giv ...
). In the quantum field theory setting, such functions are referred to as
propagator In quantum mechanics and quantum field theory, the propagator is a function that specifies the probability amplitude for a particle to travel from one place to another in a given period of time, or to travel with a certain energy and momentum. ...
s; higher order correlators are called n-point functions; working with them defines the
effective action In quantum field theory, the quantum effective action is a modified expression for the classical action taking into account quantum corrections while ensuring that the principle of least action applies, meaning that extremizing the effective ac ...
of a theory. When the random variables are anti-commuting
Grassmann number In mathematical physics, a Grassmann number, named after Hermann Grassmann (also called an anticommuting number or supernumber), is an element of the exterior algebra over the complex numbers. The special case of a 1-dimensional algebra is known as ...
s, then the partition function can be expressed as a determinant of the operator ''D''. This is done by writing it as a Berezin integral (also called Grassmann integral).


General properties

Partition functions are used to discuss
critical scaling Critical or Critically may refer to: *Critical, or critical but stable, medical states **Critical, or intensive care medicine *Critical juncture, a discontinuous change studied in the social sciences. *Critical Software, a company specializing in ...
, universality and are subject to the
renormalization group In theoretical physics, the term renormalization group (RG) refers to a formal apparatus that allows systematic investigation of the changes of a physical system as viewed at different scales. In particle physics, it reflects the changes in t ...
.


See also

*
Exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
*
Partition function (statistical mechanics) In physics, a partition function describes the statistical properties of a system in thermodynamic equilibrium. Partition functions are functions of the thermodynamic state variables, such as the temperature and volume. Most of the aggre ...
*
Partition problem In number theory and computer science, the partition problem, or number partitioning, is the task of deciding whether a given multiset ''S'' of positive integers can be partitioned into two subsets ''S''1 and ''S''2 such that the sum of the numbe ...
*
Markov random field In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph. In other words, a random field is said to b ...


References

{{reflist Entropy and information