Information field theory (IFT) is a
Bayesian
Thomas Bayes (/beɪz/; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian minister.
Bayesian () refers either to a range of concepts and approaches that relate to statistical methods based on Bayes' theorem, or a followe ...
statistical field theory
Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industri ...
relating to
signal reconstruction
In signal processing, reconstruction usually means the determination of an original continuous signal from a sequence of equally spaced samples.
This article takes a generalized abstract mathematical approach to signal sampling and reconstructi ...
,
cosmography
The term cosmography has two distinct meanings: traditionally it has been the protoscience of mapping the general features of the cosmos, heaven and Earth; more recently, it has been used to describe the ongoing effort to determine the large-sca ...
, and other related areas. IFT summarizes the information available on a
physical field using
Bayesian probabilities. It uses computational techniques developed for
quantum field theory
In theoretical physics, quantum field theory (QFT) is a theoretical framework that combines classical field theory, special relativity, and quantum mechanics. QFT is used in particle physics to construct physical models of subatomic particles and ...
and
statistical field theory
Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industri ...
to handle the infinite number of
degrees of freedom
Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
of a field and to derive
algorithms
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
for the calculation of field
expectation values. For example, the
posterior expectation value of a field generated by a known
Gaussian process
In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution, i.e. ...
and measured by a linear device with known
Gaussian noise
Gaussian noise, named after Carl Friedrich Gauss, is a term from signal processing theory denoting a kind of signal noise that has a probability density function (pdf) equal to that of the normal distribution (which is also known as the Gaussian ...
statistics is given by a
generalized Wiener filter applied to the measured data. IFT extends such known filter formula to situations with
nonlinear physics,
nonlinear devices,
non-Gaussian field or noise statistics, dependence of the noise statistics on the field values, and partly unknown parameters of measurement. For this it uses
Feynman diagrams
In theoretical physics, a Feynman diagram is a pictorial representation of the mathematical expressions describing the behavior and interaction of subatomic particles. The scheme is named after American physicist Richard Feynman, who introduce ...
,
renormalisation
Renormalization is a collection of techniques in quantum field theory, the statistical mechanics of fields, and the theory of self-similar geometric structures, that are used to treat infinities arising in calculated quantities by altering va ...
flow equations, and other methods from
mathematical physics
Mathematical physics refers to the development of mathematical methods for application to problems in physics. The '' Journal of Mathematical Physics'' defines the field as "the application of mathematics to problems in physics and the developm ...
.
Motivation
Fields play an important role in science, technology, and economy. They describe the spatial variations of a quantity, like the air temperature, as a function of position. Knowing the configuration of a field can be of large value. Measurements of fields, however, can never provide the precise field configuration with certainty. Physical fields have an infinite number of degrees of freedom, but the data generated by any measurement device is always finite, providing only a finite number of constraints on the field. Thus, an unambiguous deduction of such a field from measurement data alone is impossible and only
probabilistic inference remains as a means to make statements about the field. Fortunately, physical fields exhibit correlations and often follow known physical laws. Such information is best fused into the field inference in order to overcome the mismatch of field degrees of freedom to measurement points. To handle this, an information theory for fields is needed, and that is what information field theory is.
Concepts
Bayesian inference
is a field value at a location
in a space
. The prior knowledge about the unknown signal field
is encoded in the probability distribution
. The data
provides additional information on
via the likelihood
that gets incorporated into the posterior probability
according to
Bayes theorem
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For exa ...
.
Information Hamiltonian
In IFT Bayes theorem is usually rewritten in the language of a statistical field theory,
with the information Hamiltonian defined as
the negative logarithm of the joint probability of data and signal and with the
partition function being
This reformulation of Bayes theorem permits the usage of methods of mathematical physics developed for the treatment of
statistical field theories and
quantum field theories
In theoretical physics, quantum field theory (QFT) is a theoretical framework that combines classical field theory, special relativity, and quantum mechanics. QFT is used in particle physics to construct physical models of subatomic particles ...
.
Fields
As fields have an infinite number of degrees of freedom, the definition of probabilities over spaces of field configurations has subtleties. Identifying physical fields as elements of function spaces provides the problem that no
Lebesgue measure
In measure theory, a branch of mathematics, the Lebesgue measure, named after French mathematician Henri Lebesgue, is the standard way of assigning a measure to subsets of ''n''-dimensional Euclidean space. For ''n'' = 1, 2, or 3, it coincides wi ...
is defined over the latter and therefore probability densities can not be defined there. However, physical fields have much more regularity than most elements of function spaces, as they are continuous and smooth at most of their locations. Therefore less general, but sufficiently flexible constructions can be used to handle the infinite number of degrees of freedom of a field.
A pragmatic approach is to regard the field to be discretized in terms of pixels. Each pixel carries a single field value that is assumed to be constant within the pixel volume. All statements about the continuous field have then to be cast into its pixel representation. This way, one deals with finite dimensional field spaces, over which probability densities are well definable.
In order for this description to be a proper field theory, it is further required that the pixel resolution
can always be refined, while expectation values of the discretized field
converge to finite values:
Path integrals
If this limit exists, one can talk about the field configuration space integral or
path integral irrespective of the resolution it might be evaluated numerically.
Gaussian prior
The simplest prior for a field is that of a zero mean
Gaussian probability distributionThe determinant in the denominator might be ill-defined in the
continuum limit
In mathematical physics and mathematics, the continuum limit or scaling limit of a lattice model refers to its behaviour in the limit as the lattice spacing goes to zero. It is often useful to use lattice models to approximate real-world processe ...
, however, all what is necessary for IFT to be consistent is that this determinant can be estimated for any finite resolution field representation with
and that this permits the calculation of convergent expectation values.
A Gaussian probability distribution requires the specification of the field two point correlation function
with coefficients
and a scalar product for continuous fields
with respect to which the inverse signal field covariance
is constructed, ''i.e.''
The corresponding prior information Hamiltonian reads
Measurement equation
The measurement data
was generated with the likelihood
. In case the instrument was linear, a measurement equation of the form
can be given, in which
is the instrument response, which describes how the data on average reacts to the signal, and
is the noise, simply the difference between data
and linear signal response
. It is essential to note that the response translates the infinite dimensional signal vector into the finite dimensional data space. In components this reads
where a vector component notation was also introduced for signal and data vectors.
If the noise follows a signal independent zero mean Gaussian statistics with covariance
,
then the likelihood is Gaussian as well,
and the likelihood information Hamiltonian is
A linear measurement of a Gaussian signal, subject to Gaussian and signal independent noise leads to a free IFT.
Free theory
Free Hamiltonian
The joint information Hamiltonian of the Gaussian scenario described above is
where
denotes equality up to irrelevant constants, which, in this case, means expressions that are independent of
. From this is it clear, that the posterior must be a Gaussian with mean
and variance
,
where equality between the right and left hand sides holds as both distributions are normalized,
.
Generalized Wiener filter
The posterior mean
is also known as the generalized
Wiener filter
In signal processing, the Wiener filter is a filter used to produce an estimate of a desired or target random process by linear time-invariant ( LTI) filtering of an observed noisy process, assuming known stationary signal and noise spectra, and ...
solution and the uncertainty covariance
as the Wiener variance.
In IFT,
is called the information source, as it acts as a source term to excite the field (knowledge), and
the information propagator, as it propagates information from one location to another in
Interacting theory
Interacting Hamiltonian
If any of the assumptions that lead to the free theory is violated, IFT becomes an interacting theory, with terms that are of higher than quadratic order in the signal field. This happens when the signal or the noise are not following Gaussian statistics, when the response is non-linear, when the noise depends on the signal, or when response or covariances are uncertain.
In this case, the information Hamiltonian might be expandable in a
Taylor-
Fréchet series,
where
is the free Hamiltonian, which alone would lead to a Gaussian posterior, and
is the interacting Hamiltonian, which encodes non-Gaussian corrections. The first and second order Taylor coefficients are often identified with the (negative) information source
and information propagator
, respectively. The higher coefficients
are associated with non-linear self-interactions.
Classical field
The classical field
minimizes the information Hamiltonian,
and therefore maximizes the posterior:
The classical field
is therefore the
maximum a posteriori estimator of the field inference problem.
Critical filter
The Wiener filter problem requires the two point correlation
of a field to be known. If it is unknown, it has to be inferred along with the field itself. This requires the specification of a
hyperprior . Often, statistical homogeneity (translation invariance) can be assumed, implying that
is diagonal in
Fourier space (for
being a
dimensional
Cartesian space
A Cartesian coordinate system (, ) in a plane is a coordinate system that specifies each point uniquely by a pair of numerical coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, measured in ...
). In this case, only the Fourier space power spectrum
needs to be inferred. Given a further assumption of statistical isotropy, this spectrum depends only on the length
of the Fourier vector
and only a one dimensional spectrum
has to be determined. The prior field covariance reads then in Fourier space coordinates
.
If the prior on
is flat, the joint probability of data and spectrum is
where the notation of the information propagator
and source
of the Wiener filter problem was used again. The corresponding information Hamiltonian is
where
denotes equality up to irrelevant constants (here: constant with respect to
). Minimizing this with respect to
, in order to get its maximum a posteriori power spectrum estimator, yields
where the Wiener filter mean
and the spectral band projector
were introduced. The latter commutes with
, since
is diagonal in Fourier space. The maximum a posteriori estimator for the power spectrum is therefore
It has to be calculated iteratively, as
and
depend both on
themselves. In an
empirical Bayes
Empirical Bayes methods are procedures for statistical inference in which the prior probability distribution is estimated from the data. This approach stands in contrast to standard Bayesian methods, for which the prior distribution is fixed b ...
approach, the estimated
would be taken as given. As a consequence, the posterior mean estimate for the signal field is the corresponding
and its uncertainty the corresponding
in the empirical Bayes approximation.
The resulting non-linear filter is called the critical filter. The generalization of the power spectrum estimation formula as
exhibits a perception thresholds for
, meaning that the data variance in a Fourier band has to exceed the expected noise level by a certain threshold before the signal reconstruction
becomes non-zero for this band. Whenever the data variance exceeds this threshold slightly, the signal reconstruction jumps to a finite excitation level, similar to a
first order phase transition
In chemistry, thermodynamics, and other related fields, a phase transition (or phase change) is the physical process of transition between one state of a medium and another. Commonly the term is used to refer to changes among the basic states o ...
in thermodynamic systems. For filter with
perception of the signal starts continuously as soon the data variance exceeds the noise level. The disappearance of the discontinuous perception at
is similar to a thermodynamic system going through a
critical point. Hence the name critical filter.
The critical filter, extensions thereof to non-linear measurements, and the inclusion of non-flat spectrum priors, permitted the application of IFT to real world signal inference problems, for which the signal covariance is usually unknown a priori.
IFT application examples
The generalized Wiener filter, that emerges in free IFT, is in broad usage in signal processing. Algorithms explicitly based on IFT were derived for a number of applications. Many of them are implemented using th
Numerical Information Field Theory(NIFTy) library.
D³POis a code for ''Denoising, Deconvolving, and Decomposing Photon Observations''. It reconstructs images from individual photon count events taking into account the Poisson statistics of the counts and an instrument response function. It splits the sky emission into an image of diffuse emission and one of point sources, exploiting the different correlation structure and statistics of the two components for their separation. D³PO has been applied to data of the
Fermi
Enrico Fermi (; 29 September 1901 – 28 November 1954) was an Italian (later naturalized American) physicist and the creator of the world's first nuclear reactor, the Chicago Pile-1. He has been called the "architect of the nuclear age" and t ...
and the
RXTE satellites.
RESOLVEis a Bayesian algorithm for aperture synthesis imaging in radio astronomy. RESOLVE is similar to D³PO, but it assumes a Gaussian likelihood and a Fourier space response function. It has been applied to data of the
Very Large Array
The Karl G. Jansky Very Large Array (VLA) is a centimeter-wavelength radio astronomy observatory located in central New Mexico on the Plains of San Agustin, between the towns of Magdalena and Datil, ~ west of Socorro. The VLA comprises twent ...
.
PySESAis a ''Python framework for Spatially Explicit Spectral Analysis'' for spatially explicit spectral analysis of point clouds and geospatial data.
Advanced theory
Many techniques from quantum field theory can be used to tackle IFT problems, like Feynman diagrams, effective actions, and the field operator formalism.
Feynman diagrams
In case the interaction coefficients
in a
Taylor-
Fréchet expansion of the information Hamiltonian
are small, the log partition function, or
Helmholtz free energy
In thermodynamics, the Helmholtz free energy (or Helmholtz energy) is a thermodynamic potential that measures the useful work obtainable from a closed thermodynamic system at a constant temperature (isothermal). The change in the Helmholtz en ...
,
can be expanded asymptotically in terms of these coefficients. The free Hamiltonian specifies the mean
and variance
of the Gaussian distribution
over which the expansion is integrated. This leads to a sum over the set
of all connected
Feynman diagrams
In theoretical physics, a Feynman diagram is a pictorial representation of the mathematical expressions describing the behavior and interaction of subatomic particles. The scheme is named after American physicist Richard Feynman, who introduce ...
. From the Helmholtz free energy, any connected moment of the field can be calculated via
Situations where small expansion parameters exist that are needed for such a diagrammatic expansion to converge are given by nearly Gaussian signal fields, where the non-Gaussianity of the field statistics leads to small interaction coefficients
. For example, the statistics of the
Cosmic Microwave Background
In Big Bang cosmology the cosmic microwave background (CMB, CMBR) is electromagnetic radiation that is a remnant from an early stage of the universe, also known as "relic radiation". The CMB is faint cosmic background radiation filling all spac ...
is nearly Gaussian, with small amounts of non-Gaussianities believed to be seeded during the
inflationary epoch
__NOTOC__
In physical cosmology, the inflationary epoch was the period in the evolution of the early universe when, according to inflation theory, the universe underwent an extremely rapid exponential expansion. This rapid expansion increased the ...
in the
Early Universe
The chronology of the universe describes the history and future of the universe according to Big Bang cosmology.
Research published in 2015 estimates the earliest stages of the universe's existence as taking place 13.8 billion years ago, with ...
.
Effective action
In order to have a stable numerics for IFT problems, a field functional that if minimized provides the posterior mean field is needed. Such is given by the effective action or
Gibbs free energy
In thermodynamics, the Gibbs free energy (or Gibbs energy; symbol G) is a thermodynamic potential that can be used to calculate the maximum amount of work that may be performed by a thermodynamically closed system at constant temperature an ...
of a field. The Gibbs free energy
can be constructed from the Helmholtz free energy via a
Legendre transformation
In mathematics, the Legendre transformation (or Legendre transform), named after Adrien-Marie Legendre, is an involutive transformation on real-valued convex functions of one real variable. In physical problems, it is used to convert functions ...
.
In IFT, it is given by the difference of the internal information energy
and the
Shannon entropy
Shannon may refer to:
People
* Shannon (given name)
* Shannon (surname)
* Shannon (American singer), stage name of singer Shannon Brenda Greene (born 1958)
* Shannon (South Korean singer), British-South Korean singer and actress Shannon Arrum W ...
for temperature
,
where a Gaussian posterior approximation
is used with the approximate data
containing the mean and the dispersion of the field.
The Gibbs free energy is then
the
Kullback-Leibler divergence between approximative and exact posterior plus the Helmholtz free energy. As the latter does not depend on the approximate data
, minimizing the Gibbs free energy is equivalent to minimizing the Kullback-Leibler divergence between approximate and exact posterior. Thus, the effective action approach of IFT is equivalent to the
variational Bayesian methods
Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables (usually ...
, which also minimize the Kullback-Leibler divergence between approximate and exact posteriors.
Minimizing the Gibbs free energy provides approximatively the posterior mean field
whereas minimizing the information Hamiltonian provides the maximum a posteriori field. As the latter is known to over-fit noise, the former is usually a better field estimator.
Operator formalism
The calculation of the Gibbs free energy requires the calculation of Gaussian integrals over an information Hamiltonian, since the internal information energy is
Such integrals can be calculated via a field operator formalism, in which
is the field operator. This generates the field expression
within the integral if applied to the Gaussian distribution function,
and any higher power of the field if applied several times,
If the information Hamiltonian is analytical, all its terms can be generated via the field operator
As the field operator does not depend on the field
itself, it can be pulled out of the path integral of the internal information energy construction,
where
should be regarded as a functional that always returns the value
irrespective the value of its input
. The resulting expression can be calculated by commuting the mean field annihilator
to the right of the expression, where they vanish since
. The mean field annihilator
commutes with the mean field as
By the usage of the field operator formalism the Gibbs free energy can be calculated, which permits the (approximate) inference of the posterior mean field via a numerical robust functional minimization.
History
The book of
Norbert Wiener
Norbert Wiener (November 26, 1894 – March 18, 1964) was an American mathematician and philosopher. He was a professor of mathematics at the Massachusetts Institute of Technology (MIT). A child prodigy, Wiener later became an early researcher ...
might be regarded as one of the first works on field inference. The usage of path integrals for field inference was proposed by a number of authors, e.g. Edmund Bertschinger or William Bialek and A. Zee. The connection of field theory and Bayesian reasoning was made explicit by Jörg Lemm. The term ''information field theory'' ''was'' coined by Torsten Enßlin.
See the latter reference for more information on the history of IFT.
See also
*
Bayesian inference
Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and ...
*
Bayesian hierarchical modeling
*
Gaussian process
In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution, i.e. ...
*
Statistical Inference
Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properti ...
References
{{Reflist
Bayesian inference
Bayesian statistics