information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...

, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of

data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...

. Roughly speaking, the theorem states that although there are many series of results that may be produced by a random process, the one actually produced is most probably from a loosely defined set of outcomes that all have approximately the same chance of being the one actually realized. (This is a consequence of the

law of large numbers In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials sho ...

and

ergodic theory Ergodic theory (Greek: ' "work", ' "way") is a branch of mathematics that studies statistical properties of deterministic dynamical systems; it is the study of ergodicity. In this context, statistical properties means properties which are expres ...

.) Although there are individual outcomes which have a higher probability than any outcome in this set, the vast number of outcomes in the set almost guarantees that the outcome will come from the set. One way of intuitively understanding the property is through Cramér's large deviation theorem, which states that the probability of a large deviation from mean decays exponentially with the number of samples. Such results are studied in

large deviations theory In probability theory, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions. While some basic ideas of the theory can be traced to Laplace, the formalization started with insura ...

; intuitively, it is the large deviations that would violate equipartition, but these are unlikely. In the field of

pseudorandom number generation A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers. The PRNG-generate ...

, a candidate generator of undetermined quality whose output sequence lies too far outside the typical set by some statistical criteria is rejected as insufficiently random. Thus, although the typical set is loosely defined, practical notions arise concerning ''sufficient'' typicality.

Definition

Given a discrete-time stationary ergodic stochastic process

X

on the

probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...

(\Omega, B, p)

, the asymptotic equipartition property is an assertion that

-\frac \log p(X_1, X_2, \dots, X_n) \to H(X) \quad \text \quad n\to\infty

where

H(X)

or simply

H

denotes the

entropy rate In the mathematical theory of probability, the entropy rate or source information rate of a stochastic process is, informally, the time density of the average information in a stochastic process. For stochastic processes with a countable index, th ...

X

, which must exist for all discrete-time

stationary process In mathematics and statistics, a stationary process (or a strict/strictly stationary process or strong/strongly stationary process) is a stochastic process whose unconditional joint probability distribution does not change when shifted in time. ...

es including the ergodic ones. The asymptotic equipartition property is proved for finite-valued (i.e.

, \Omega,  < \infty

) stationary ergodic stochastic processes in the Shannon–McMillan–Breiman theorem using the ergodic theory and for any i.i.d. sources directly using the law of large numbers in both the discrete-valued case (where

H

is simply the

entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...

of a symbol) and the continuous-valued case (where ''H'' is the differential entropy instead). The definition of the asymptotic equipartition property can also be extended for certain classes of continuous-time stochastic processes for which a typical set exists for long enough observation time. The convergence is proven almost sure in all cases.

Discrete-time i.i.d. sources

Given

X

is an i.i.d. source which may take values in the alphabet

\mathcal

, its

time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. E ...

X_1,\ldots,X_n

is i.i.d. with

H(X)

. The weak

gives the asymptotic equipartition property with

convergence in probability In probability theory, there exist several different notions of convergence of random variables. The convergence of sequences of random variables to some limit random variable is an important concept in probability theory, and its applications to ...

0 \qquad \forall \varepsilon > 0.

since the entropy is equal to the expectation of

-\frac \log p(X_1, X_2, \ldots , X_n).

The strong law of large numbers asserts the stronger almost sure convergence,

\Pr\left lim_ - \frac \log p(X_1, X_2,\ldots, X_n) = H(X)\right 1.

Discrete-time finite-valued stationary ergodic sources

Consider a finite-valued sample space

\Omega

, i.e.

, \Omega,  < \infty

, for the discrete-time stationary ergodic process

X:=\

defined on the

(\Omega, B, p)

. The asymptotic equipartition property for such stochastic source is known as the Shannon–McMillan–Breiman theorem, due to

Claude Shannon Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American mathematician, electrical engineer, and cryptographer known as a "father of information theory". As a 21-year-old master's degree student at the Massachusetts In ...

, Brockway McMillan, and

Leo Breiman Leo Breiman (January 27, 1928 – July 5, 2005) was a distinguished statistician at the University of California, Berkeley. He was the recipient of numerous honors and awards, and was a member of the United States National Academy of Sciences. ...

Non-stationary discrete-time source producing independent symbols

The assumptions of stationarity/ergodicity/identical distribution of random variables is not essential for the asymptotic equipartition property to hold. Indeed, as is quite clear intuitively, the asymptotic equipartition property requires only some form of the law of large numbers to hold, which is fairly general. However, the expression needs to be suitably generalized, and the conditions need to be formulated precisely. We assume that the source is producing independent symbols, with possibly different output statistics at each instant. We assume that the statistics of the process are known completely, that is, the marginal distribution of the process seen at each time instant is known. The joint distribution is just the product of marginals. Then, under the condition (which can be relaxed) that

\mathrm

log p(X_i) Log most often refers to: * Trunk (botany), the stem and main wooden axis of a tree, called logs when cut ** Logging, cutting down trees for logs ** Firewood, logs used for fuel ** Lumber or timber, converted from wood logs * Logarithm, in mathem ...

M for all ''i'', for some ''M'' > 0, the following holds (AEP):

1\qquad \forall \varepsilon > 0

where

\overline_n(X)=\fracH(X_1,X_2,\ldots,X_n)

Applications

The asymptotic equipartition property for non-stationary discrete-time independent process leads us to (among other results) the source coding theorem for non-stationary source (with independent output symbols) and

noisy-channel coding theorem In information theory, the noisy-channel coding theorem (sometimes Shannon's theorem or Shannon's limit), establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data (dig ...

for non-stationary memoryless channels.

Continuous-time stationary ergodic sources

Discrete-time functions can be interpolated to continuous-time functions. If such interpolation ''f'' is

measurable In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simila ...

, we may define the continuous-time stationary process accordingly as

\tilde:=f\circ X

. If the asymptotic equipartition property holds for the discrete-time process, as in the i.i.d. or finite-valued stationary ergodic cases shown above, it automatically holds for the continuous-time stationary process derived from it by some measurable interpolation. i.e.

-\frac \log p(\tilde_0^\tau) \to H(X)

where ''n'' corresponds to the degree of freedom in time . and are the entropy per unit time and per degree of freedom respectively, defined by Shannon. An important class of such continuous-time stationary process is the bandlimited stationary ergodic process with the sample space being a subset of the continuous

\mathcal_2

functions. The asymptotic equipartition property holds if the process is white, in which case the time samples are i.i.d., or there exists ''T'' > 1/2''W'', where ''W'' is the nominal bandwidth, such that the ''T''-spaced time samples take values in a finite set, in which case we have the discrete-time finite-valued stationary ergodic process. Any

time-invariant In control theory, a time-invariant (TIV) system has a time-dependent system function that is not a direct function of time. Such systems are regarded as a class of systems in the field of system analysis. The time-dependent system function is ...

operations also preserves the asymptotic equipartition property, stationarity and ergodicity and we may easily turn a stationary process to non-stationary without losing the asymptotic equipartition property by nulling out a finite number of time samples in the process.

Category theory

A category theoretic definition for the equipartition property is given by Gromov.Misha Gromov, (2012)
In a Search for a Structure, Part 1: On Entropy
. ''(See page 5, where the equipartition property is called the 'Bernoulli approximation theorem'.)'' Given a sequence of Cartesian powers

P^N=P\times \cdots \times P

of a measure space ''P'', this sequence admits an ''asymptotically equivalent'' sequence ''H_N'' of homogeneous measure spaces (''i.e.'' all sets have the same measure; all morphisms are invariant under the group of automorphisms, and thus factor as a morphism to the

terminal object In category theory, a branch of mathematics, an initial object of a category is an object in such that for every object in , there exists precisely one morphism . The dual notion is that of a terminal object (also called terminal element) ...

) . The above requires a definition of ''asymptotic equivalence''. This is given in terms of a distance function, giving how much an

injective correspondence In mathematics, an injective function (also known as injection, or one-to-one function) is a function that maps distinct elements of its domain to distinct elements; that is, implies . (Equivalently, implies in the equivalent contrapositi ...

differs from an

isomorphism In mathematics, an isomorphism is a structure-preserving mapping between two structures of the same type that can be reversed by an inverse mapping. Two mathematical structures are isomorphic if an isomorphism exists between them. The word i ...

. An injective correspondence

\pi: P\to Q

is a

partially defined map In mathematics, a partial function from a set to a set is a function from a subset of (possibly itself) to . The subset , that is, the domain of viewed as a function, is called the domain of definition of . If equals , that is, if is de ...

that is a bijection; that is, it is a bijection between a subset

P'\subset P

and

Q'\subset Q

. Then define

, P-Q, _\pi = , P \setminus P',  + , Q \setminus Q',

where , ''S'', denotes the measure of a set ''S''. In what follows, the measure of ''P'' and ''Q'' are taken to be 1, so that the measure spaces are probability spaces. This distance

, P-Q, _\pi

is commonly known as the earth mover's distance or

Wasserstein metric In mathematics, the Wasserstein distance or Kantorovich–Rubinstein metric is a distance function defined between probability distributions on a given metric space M. It is named after Leonid Vaseršteĭn. Intuitively, if each distribution is ...

. Similarly, define

, \log P:Q, _\pi = \frac

with

, \operatorname(P),

taken to be the counting measure on ''P''. Thus, this definition requires that ''P'' be a finite measure space. Finally, let

\text_\pi(P,Q) = , P-Q, _\pi +, \log P:Q, _\pi

A sequence of injective correspondences

\pi_N:P_N\to Q_N

are then asymptotically equivalent when

\text_(P_N,Q_N) \to 0 \quad\text\quad N\to\infty

Given a homogenous space sequence ''H_N'' that is asymptotically equivalent to ''P^N'', the entropy ''H''(''P'') of ''P'' may be taken as

H(P) = \lim_\frac , \operatorname(H_N),

Notes

References

Journal articles

* Claude E. Shannon. "

A Mathematical Theory of Communication "A Mathematical Theory of Communication" is an article by mathematician Claude E. Shannon published in '' Bell System Technical Journal'' in 1948. It was renamed ''The Mathematical Theory of Communication'' in the 1949 book of the same name, a sma ...

". ''Bell System Technical Journal'', July/October 1948. * * Sergio Verdu and Te Sun Han. "The Role of the Asymptotic Equipartition Property in Noiseless Source Coding." ''IEEE Transactions on Information Theory'', 43(3): 847–857, 1997.

Textbooks

* * {{cite book , last=MacKay, first=David J.C. , author-link=David J. C. MacKay, url=http://www.inference.phy.cam.ac.uk/mackay/itila/book.html, title=Information Theory, Inference, and Learning Algorithms, publisher=Cambridge University Press, year=2003, isbn=0-521-64298-1 Information theory Theorems in statistics