The partition function or configuration integral, as used in
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
,
information theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
and
dynamical systems
In mathematics, a dynamical system is a system in which a function describes the time dependence of a point in an ambient space. Examples include the mathematical models that describe the swinging of a clock pendulum, the flow of water in ...
, is a generalization of the definition of a
partition function in statistical mechanics
In physics, a partition function describes the statistical properties of a system in thermodynamic equilibrium. Partition functions are functions of the thermodynamic state variables, such as the temperature and volume. Most of the aggregate ...
. It is a special case of a
normalizing constant
The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics. The normalizing constant is used to reduce any probability function to a probability density function with total probability of one.
...
in probability theory, for the
Boltzmann distribution
In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability ...
. The partition function occurs in many problems of probability theory because, in situations where there is a natural symmetry, its associated
probability measure
In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more g ...
, the
Gibbs measure, has the
Markov property
In probability theory and statistics, the term Markov property refers to the memoryless property of a stochastic process. It is named after the Russian mathematician Andrey Markov. The term strong Markov property is similar to the Markov prop ...
. This means that the partition function occurs not only in physical systems with translation symmetry, but also in such varied settings as neural networks (the
Hopfield network
A Hopfield network (or Ising model of a neural network or Ising–Lenz–Little model) is a form of recurrent artificial neural network and a type of spin glass system popularised by John Hopfield in 1982 as described earlier by Little in 1974 b ...
), and applications such as
genomics
Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
,
corpus linguistics
Corpus linguistics is the study of a language as that language is expressed in its text corpus (plural ''corpora''), its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora ...
and
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machine
A machine is a physical system using Power (physics), power to apply Force, forces and control Motion, moveme ...
, which employ
Markov networks, and
Markov logic network
A Markov logic network (MLN) is a probabilistic logic which applies the ideas of a Markov network to first-order logic, enabling uncertain inference. Markov logic networks generalize first-order logic, in the sense that, in a certain limit, all ...
s. The Gibbs measure is also the unique measure that has the property of maximizing the
entropy
Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...
for a fixed expectation value of the energy; this underlies the appearance of the partition function in
maximum entropy methods and the algorithms derived therefrom.
The partition function ties together many different concepts, and thus offers a general framework in which many different kinds of quantities may be calculated. In particular, it shows how to calculate
expectation values and
Green's function
In mathematics, a Green's function is the impulse response of an inhomogeneous linear differential operator defined on a domain with specified initial conditions or boundary conditions.
This means that if \operatorname is the linear differenti ...
s, forming a bridge to
Fredholm theory. It also provides a natural setting for the
information geometry approach to information theory, where the
Fisher information metric can be understood to be a
correlation function
A correlation function is a function that gives the statistical correlation between random variables, contingent on the spatial or temporal distance between those variables. If one considers the correlation function between random variables re ...
derived from the partition function; it happens to define a
Riemannian manifold
In differential geometry, a Riemannian manifold or Riemannian space , so called after the German mathematician Bernhard Riemann, is a real, smooth manifold ''M'' equipped with a positive-definite inner product ''g'p'' on the tangent spac ...
.
When the setting for random variables is on
complex projective space
In mathematics, complex projective space is the projective space with respect to the field of complex numbers. By analogy, whereas the points of a real projective space label the lines through the origin of a real Euclidean space, the points of ...
or
projective Hilbert space In mathematics and the foundations of quantum mechanics, the projective Hilbert space P(H) of a complex Hilbert space H is the set of equivalence classes of non-zero vectors v in H, for the relation \sim on H given by
:w \sim v if and only if v ...
, geometrized with the
Fubini–Study metric, the theory of
quantum mechanics
Quantum mechanics is a fundamental theory in physics that provides a description of the physical properties of nature at the scale of atoms and subatomic particles. It is the foundation of all quantum physics including quantum chemistry, q ...
and more generally
quantum field theory
In theoretical physics, quantum field theory (QFT) is a theoretical framework that combines classical field theory, special relativity, and quantum mechanics. QFT is used in particle physics to construct physical models of subatomic particles a ...
results. In these theories, the partition function is heavily exploited in the
path integral formulation
The path integral formulation is a description in quantum mechanics that generalizes the action principle of classical mechanics. It replaces the classical notion of a single, unique classical trajectory for a system with a sum, or functional ...
, with great success, leading to many formulas nearly identical to those reviewed here. However, because the underlying measure space is complex-valued, as opposed to the real-valued
simplex
In geometry, a simplex (plural: simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex is so-named because it represents the simplest possible polytope in any given dimension ...
of probability theory, an extra factor of ''i'' appears in many formulas. Tracking this factor is troublesome, and is not done here. This article focuses primarily on classical probability theory, where the sum of probabilities total to one.
Definition
Given a set of
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the p ...
s
taking on values
, and some sort of
potential function or
Hamiltonian , the partition function is defined as
:
The function ''H'' is understood to be a real-valued function on the space of states
, while
is a real-valued free parameter (conventionally, the
inverse temperature). The sum over the
is understood to be a sum over all possible values that each of the random variables
may take. Thus, the sum is to be replaced by an
integral
In mathematics, an integral assigns numbers to functions in a way that describes displacement, area, volume, and other concepts that arise by combining infinitesimal data. The process of finding integrals is called integration. Along with ...
when the
are continuous, rather than discrete. Thus, one writes
:
for the case of continuously-varying
.
When ''H'' is an
observable
In physics, an observable is a physical quantity that can be measured. Examples include position and momentum. In systems governed by classical mechanics, it is a real-valued "function" on the set of all possible system states. In quantum phys ...
, such as a finite-dimensional
matrix
Matrix most commonly refers to:
* ''The Matrix'' (franchise), an American media franchise
** '' The Matrix'', a 1999 science-fiction action film
** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchi ...
or an infinite-dimensional
Hilbert space
In mathematics, Hilbert spaces (named after David Hilbert) allow generalizing the methods of linear algebra and calculus from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise natu ...
operator
Operator may refer to:
Mathematics
* A symbol indicating a mathematical operation
* Logical operator or logical connective in mathematical logic
* Operator (mathematics), mapping that acts on elements of a space to produce elements of another ...
or element of a
C-star algebra
In mathematics, specifically in functional analysis, a C∗-algebra (pronounced "C-star") is a Banach algebra together with an involution satisfying the properties of the adjoint. A particular case is that of a complex algebra ''A'' of cont ...
, it is common to express the summation as a
trace, so that
:
When ''H'' is infinite-dimensional, then, for the above notation to be valid, the argument must be
trace class In mathematics, specifically functional analysis, a trace-class operator is a linear operator for which a trace may be defined, such that the trace is a finite number independent of the choice of basis used to compute the trace. This trace of trace- ...
, that is, of a form such that the summation exists and is bounded.
The number of variables
need not be
countable
In mathematics, a set is countable if either it is finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function from it into the natural number ...
, in which case the sums are to be replaced by
functional integrals. Although there are many notations for functional integrals, a common one would be
:
Such is the case for the
partition function in quantum field theory.
A common, useful modification to the partition function is to introduce auxiliary functions. This allows, for example, the partition function to be used as a
generating function
In mathematics, a generating function is a way of encoding an infinite sequence of numbers () by treating them as the coefficients of a formal power series. This series is called the generating function of the sequence. Unlike an ordinary ser ...
for
correlation function
A correlation function is a function that gives the statistical correlation between random variables, contingent on the spatial or temporal distance between those variables. If one considers the correlation function between random variables re ...
s. This is discussed in greater detail below.
The parameter β
The role or meaning of the parameter
can be understood in a variety of different ways. In classical thermodynamics, it is an
inverse temperature. More generally, one would say that it is the variable that is
conjugate to some (arbitrary) function
of the random variables
. The word ''conjugate'' here is used in the sense of conjugate
generalized coordinates
In analytical mechanics, generalized coordinates are a set of parameters used to represent the state of a system in a configuration space. These parameters must uniquely define the configuration of the system relative to a reference state.,p. 39 ...
in
Lagrangian mechanics
In physics, Lagrangian mechanics is a formulation of classical mechanics founded on the stationary-action principle (also known as the principle of least action). It was introduced by the Italian-French mathematician and astronomer Joseph-Lou ...
, thus, properly
is a
Lagrange multiplier
In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied ...
. It is not uncommonly called the
generalized force. All of these concepts have in common the idea that one value is meant to be kept fixed, as others, interconnected in some complicated way, are allowed to vary. In the current case, the value to be kept fixed is the
expectation value of
, even as many different
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
s can give rise to exactly this same (fixed) value.
For the general case, one considers a set of functions
that each depend on the random variables
. These functions are chosen because one wants to hold their expectation values constant, for one reason or another. To constrain the expectation values in this way, one applies the method of
Lagrange multiplier
In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints (i.e., subject to the condition that one or more equations have to be satisfied ...
s. In the general case,
maximum entropy methods illustrate the manner in which this is done.
Some specific examples are in order. In basic thermodynamics problems, when using the
canonical ensemble
In statistical mechanics, a canonical ensemble is the statistical ensemble that represents the possible states of a mechanical system in thermal equilibrium with a heat bath at a fixed temperature. The system can exchange energy with the hea ...
, the use of just one parameter
reflects the fact that there is only one expectation value that must be held constant: the
free energy (due to
conservation of energy
In physics and chemistry, the law of conservation of energy states that the total energy of an isolated system remains constant; it is said to be ''conserved'' over time. This law, first proposed and tested by Émilie du Châtelet, means tha ...
). For chemistry problems involving chemical reactions, the
grand canonical ensemble provides the appropriate foundation, and there are two Lagrange multipliers. One is to hold the energy constant, and another, the
fugacity
In chemical thermodynamics, the fugacity of a real gas is an effective partial pressure which replaces the mechanical partial pressure in an accurate computation of the chemical equilibrium constant. It is equal to the pressure of an ideal gas ...
, is to hold the particle count constant (as chemical reactions involve the recombination of a fixed number of atoms).
For the general case, one has
:
with
a point in a space.
For a collection of observables
, one would write
: