In
information theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
, the asymptotic equipartition property (AEP) is a general property of the output samples of a
stochastic source. It is fundamental to the concept of
typical set used in theories of
data compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
.
Roughly speaking, the theorem states that although there are many series of results that may be produced by a random process, the one actually produced is most probably from a loosely defined set of outcomes that all have approximately the same chance of being the one actually realized. (This is a consequence of the
law of large numbers
In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials sho ...
and
ergodic theory
Ergodic theory (Greek: ' "work", ' "way") is a branch of mathematics that studies statistical properties of deterministic dynamical systems; it is the study of ergodicity. In this context, statistical properties means properties which are expres ...
.) Although there are individual outcomes which have a higher probability than any outcome in this set, the vast number of outcomes in the set almost guarantees that the outcome will come from the set. One way of intuitively understanding the property is through
Cramér's large deviation theorem, which states that the probability of a large deviation from mean decays exponentially with the number of samples. Such results are studied in
large deviations theory
In probability theory, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions. While some basic ideas of the theory can be traced to Laplace, the formalization started with insura ...
; intuitively, it is the large deviations that would violate equipartition, but these are unlikely.
In the field of
pseudorandom number generation
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generate ...
, a candidate generator of undetermined quality whose output sequence lies too far outside the typical set by some statistical criteria is rejected as insufficiently random. Thus, although the typical set is loosely defined, practical notions arise concerning ''sufficient'' typicality.
Definition
Given a discrete-time stationary ergodic stochastic process
on the
probability space
In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
, the asymptotic equipartition property is an assertion that
where
or simply
denotes the
entropy rate
In the mathematical theory of probability, the entropy rate or source information rate of a stochastic process is, informally, the time density of the average information in a stochastic process. For stochastic processes with a countable index, th ...
of
, which must exist for all discrete-time
stationary process
In mathematics and statistics, a stationary process (or a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. ...
es including the ergodic ones. The asymptotic equipartition property is proved for finite-valued (i.e.
) stationary ergodic stochastic processes in the
Shannon–McMillan–Breiman theorem using the ergodic theory and for any
i.i.d. sources directly using the law of large numbers in both the discrete-valued case (where
is simply the
entropy
Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...
of a symbol) and the continuous-valued case (where ''H'' is the differential entropy instead). The definition of the asymptotic equipartition property can also be extended for certain classes of continuous-time stochastic processes for which a typical set exists for long enough observation time. The convergence is proven
almost sure in all cases.
Discrete-time i.i.d. sources
Given
is an
i.i.d. source which may take values in the alphabet
, its
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. E ...
is i.i.d. with
entropy
Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...
. The weak
law of large numbers
In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials sho ...
gives the asymptotic equipartition property with
convergence in probability
In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to ...
,
since the entropy is equal to the expectation of
The strong law of large numbers asserts the stronger almost sure convergence,
Discrete-time finite-valued stationary ergodic sources
Consider a finite-valued sample space
, i.e.
, for the discrete-time
stationary ergodic process defined on the
probability space
In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
. The asymptotic equipartition property for such stochastic source is known as the Shannon–McMillan–Breiman theorem, due to
Claude Shannon
Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American mathematician, electrical engineer, and cryptographer known as a "father of information theory".
As a 21-year-old master's degree student at the Massachusetts In ...
,
Brockway McMillan, and
Leo Breiman
Leo Breiman (January 27, 1928 – July 5, 2005) was a distinguished statistician at the University of California, Berkeley. He was the recipient of numerous honors and awards, and was a member of the United States National Academy of Sciences. ...
.
Non-stationary discrete-time source producing independent symbols
The assumptions of stationarity/ergodicity/identical distribution of random variables is not essential for the asymptotic equipartition property to hold. Indeed, as is quite clear intuitively, the asymptotic equipartition property requires only some form of the law of large numbers to hold, which is fairly general. However, the expression needs to be suitably generalized, and the conditions need to be formulated precisely.
We assume that the source is producing independent symbols, with possibly different output statistics at each instant. We assume that the statistics of the process are known completely, that is, the marginal distribution of the process seen at each time instant is known. The joint distribution is just the product of marginals. Then, under the condition (which can be relaxed) that
for all ''i'', for some ''M'' > 0, the following holds (AEP):
where
Applications
The asymptotic equipartition property for non-stationary discrete-time independent process leads us to (among other results) the
source coding theorem for non-stationary source (with independent output symbols) and
noisy-channel coding theorem
In information theory, the noisy-channel coding theorem (sometimes Shannon's theorem or Shannon's limit), establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data (dig ...
for non-stationary memoryless channels.
Continuous-time stationary ergodic sources
Discrete-time functions can be interpolated to continuous-time functions. If such interpolation ''f'' is
measurable
In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simila ...
, we may define the continuous-time stationary process accordingly as
. If the asymptotic equipartition property holds for the discrete-time process, as in the i.i.d. or finite-valued stationary ergodic cases shown above, it automatically holds for the continuous-time stationary process derived from it by some measurable interpolation. i.e.
where ''n'' corresponds to the degree of freedom in time . and are the entropy per unit time and per degree of freedom respectively, defined by
Shannon.
An important class of such continuous-time stationary process is the bandlimited stationary ergodic process with the sample space being a subset of the continuous
functions. The asymptotic equipartition property holds if the process is white, in which case the time samples are i.i.d., or there exists ''T'' > 1/2''W'', where ''W'' is the
nominal bandwidth, such that the ''T''-spaced time samples take values in a finite set, in which case we have the discrete-time finite-valued stationary ergodic process.
Any
time-invariant
In control theory, a time-invariant (TIV) system has a time-dependent system function that is not a direct function of time. Such systems are regarded as a class of systems in the field of system analysis. The time-dependent system function is ...
operations also preserves the asymptotic equipartition property, stationarity and ergodicity and we may easily turn a stationary process to non-stationary without losing the asymptotic equipartition property by nulling out a finite number of time samples in the process.
Category theory
A
category theoretic definition for the equipartition property is given by
Gromov.
[Misha Gromov, (2012)]
In a Search for a Structure, Part 1: On Entropy
. ''(See page 5, where the equipartition property is called the 'Bernoulli approximation theorem'.)'' Given a sequence of
Cartesian powers of a measure space ''P'', this sequence admits an ''asymptotically equivalent'' sequence ''H
N'' of homogeneous measure spaces (''i.e.'' all sets have the same measure; all morphisms are invariant under the group of automorphisms, and thus factor as a morphism to the
terminal object
In category theory, a branch of mathematics, an initial object of a category is an object in such that for every object in , there exists precisely one morphism .
The dual notion is that of a terminal object (also called terminal element) ...
) .
The above requires a definition of ''asymptotic equivalence''. This is given in terms of a distance function, giving how much an
injective correspondence
In mathematics, an injective function (also known as injection, or one-to-one function) is a function that maps distinct elements of its domain to distinct elements; that is, implies . (Equivalently, implies in the equivalent contrapositi ...
differs from an
isomorphism
In mathematics, an isomorphism is a structure-preserving mapping between two structures of the same type that can be reversed by an inverse mapping. Two mathematical structures are isomorphic if an isomorphism exists between them. The word i ...
. An injective correspondence
is a
partially defined map
In mathematics, a partial function from a set to a set is a function from a subset of (possibly itself) to . The subset , that is, the domain of viewed as a function, is called the domain of definition of . If equals , that is, if is de ...
that is a
bijection; that is, it is a bijection between a subset
and
. Then define
where , ''S'', denotes the measure of a set ''S''. In what follows, the measure of ''P'' and ''Q'' are taken to be 1, so that the measure spaces are probability spaces. This distance
is commonly known as the
earth mover's distance or
Wasserstein metric
In mathematics, the Wasserstein distance or Kantorovich–Rubinstein metric is a distance function defined between probability distributions on a given metric space M. It is named after Leonid Vaseršteĭn.
Intuitively, if each distribution is ...
.
Similarly, define
with
taken to be the counting measure on ''P''. Thus, this definition requires that ''P'' be a finite measure space. Finally, let
A sequence of injective correspondences
are then asymptotically equivalent when
Given a homogenous space sequence ''H
N'' that is asymptotically equivalent to ''P
N'', the entropy ''H''(''P'') of ''P'' may be taken as
See also
*
Cramér's theorem (large deviations) Cramér's theorem is a fundamental result in the theory of large deviations, a subdiscipline of probability theory. It determines the rate function of a series of iid random variables.
A weak version of this result was first shown by Harald Cramé ...
*
Shannon's source coding theorem
*
Noisy-channel coding theorem
In information theory, the noisy-channel coding theorem (sometimes Shannon's theorem or Shannon's limit), establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data (dig ...
Notes
References
Journal articles
* Claude E. Shannon. "
A Mathematical Theory of Communication
"A Mathematical Theory of Communication" is an article by mathematician Claude E. Shannon published in '' Bell System Technical Journal'' in 1948. It was renamed ''The Mathematical Theory of Communication'' in the 1949 book of the same name, a sma ...
". ''Bell System Technical Journal'', July/October 1948.
*
* Sergio Verdu and Te Sun Han. "The Role of the Asymptotic Equipartition Property in Noiseless Source Coding." ''IEEE Transactions on Information Theory'', 43(3): 847–857, 1997.
Textbooks
*
* {{cite book , last=MacKay, first=David J.C. , author-link=David J. C. MacKay, url=http://www.inference.phy.cam.ac.uk/mackay/itila/book.html, title=Information Theory, Inference, and Learning Algorithms, publisher=Cambridge University Press, year=2003, isbn=0-521-64298-1
Information theory
Theorems in statistics