In
information theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
, the typical set is a set of sequences whose
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
is close to two raised to the negative power of the
entropy
Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...
of their source distribution. That this set has total
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
close to one is a consequence of the
asymptotic equipartition property
In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of data compression.
Roughly speaking, the t ...
(AEP) which is a kind of
law of large numbers
In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials sho ...
. The notion of typicality is only concerned with the probability of a sequence and not the actual sequence itself.
This has great use in
compression theory as it provides a theoretical means for compressing data, allowing us to represent any sequence ''X''
''n'' using ''nH''(''X'') bits on average, and, hence, justifying the use of entropy as a measure of information from a source.
The AEP can also be proven for a large class of
stationary ergodic processes, allowing typical set to be defined in more general cases.
(Weakly) typical sequences (weak typicality, entropy typicality)
If a sequence ''x''
1, ..., ''x''
''n'' is drawn from an
i.i.d. distribution ''X'' defined over a finite alphabet
, then the typical set, ''A''
''ε''(''n'')(''n'') is defined as those sequences which satisfy:
:
where
:
is the information entropy of ''X''. The probability above need only be within a factor of 2
''n'' ''ε''. Taking the logarithm on all sides and dividing by ''-n'', this definition can be equivalently stated as
:
For i.i.d sequence, since
:
we further have
:
By the law of large numbers, for sufficiently large ''n''
:
Properties
An essential characteristic of the typical set is that, if one draws a large number ''n'' of independent random samples from the distribution ''X'', the resulting sequence (''x''
1, ''x''
2, ..., ''x''
''n'') is very likely to be a member of the typical set, even though the typical set comprises only a small fraction of all the possible sequences. Formally, given any
, one can choose ''n'' such that:
#The probability of a sequence from ''X''
(n) being drawn from ''A''
''ε''(''n'') is greater than 1 − ''ε'', i.e.
#
#
#If the distribution over
is not uniform, then the fraction of sequences that are typical is
::
::as ''n'' becomes very large, since
where
is the
cardinality
In mathematics, the cardinality of a set is a measure of the number of elements of the set. For example, the set A = \ contains 3 elements, and therefore A has a cardinality of 3. Beginning in the late 19th century, this concept was generalized ...
of
.
For a general stochastic process with AEP, the (weakly) typical set can be defined similarly with ''p''(''x''
1, ''x''
2, ..., ''x''
''n'') replaced by ''p''(''x''
0''τ'') (i.e. the probability of the sample limited to the time interval
, ''τ'', ''n'' being the
degree of freedom
Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
of the process in the time interval and ''H''(''X'') being the
entropy rate
In the mathematical theory of probability, the entropy rate or source information rate of a stochastic process is, informally, the time density of the average information in a stochastic process. For stochastic processes with a countable index, th ...
. If the process is continuous-valued,
differential entropy
Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continu ...
is used instead.
Example
Counter-intuitively, the most likely sequence is often not a member of the typical set. For example, suppose that ''X'' is an i.i.d
Bernoulli random variable
In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabil ...
with ''p''(0)=0.1 and ''p''(1)=0.9. In ''n'' independent trials, since ''p''(1)>''p''(0), the most likely sequence of outcome is the sequence of all 1's, (1,1,...,1). Here the entropy of ''X'' is ''H''(''X'')=0.469, while
:
So this sequence is not in the typical set because its average logarithmic probability cannot come arbitrarily close to the entropy of the random variable ''X'' no matter how large we take the value of ''n''.
For Bernoulli random variables, the typical set consists of sequences with average numbers of 0s and 1s in ''n'' independent trials. This is easily demonstrated: If ''p(1) = p'' and ''p(0) = 1-p'', then for ''n'' trials with ''m'' 1's, we have
:
The average number of 1's in a sequence of Bernoulli trials is ''m = np''. Thus, we have
:
For this example, if ''n''=10, then the typical set consist of all sequences that have a single 0 in the entire sequence. In case ''p''(0)=''p''(1)=0.5, then every possible binary sequences belong to the typical set.
Strongly typical sequences (strong typicality, letter typicality)
If a sequence ''x''
1, ..., ''x''
''n'' is drawn from some specified joint distribution defined over a finite or an infinite alphabet
, then the strongly typical set, ''A''
ε,strong(''n'') is defined as the set of sequences which satisfy
:
where
is the number of occurrences of a specific symbol in the sequence.
It can be shown that strongly typical sequences are also weakly typical (with a different constant ε), and hence the name. The two forms, however, are not equivalent. Strong typicality is often easier to work with in proving theorems for memoryless channels. However, as is apparent from the definition, this form of typicality is only defined for random variables having finite support.
Jointly typical sequences
Two sequences
and
are jointly ε-typical if the pair
is ε-typical with respect to the joint distribution
and both
and
are ε-typical with respect to their marginal distributions
and
. The set of all such pairs of sequences
is denoted by
. Jointly ε-typical ''n''-tuple sequences are defined similarly.
Let
and
be two independent sequences of random variables with the same marginal distributions
and
. Then for any ε>0, for sufficiently large ''n'', jointly typical sequences satisfy the following properties:
#
#
#
#
#
Applications of typicality
Typical set encoding
In
information theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
, typical set encoding encodes only the sequences in the typical set of a stochastic source with fixed length block codes. Since the size of the typical set is about ''2''
nH(X), only ''nH(X)'' bits are required for the coding, while at the same time ensuring that the chances of encoding error is limited to ε. Asymptotically, it is, by the AEP, lossless and achieves the minimum rate equal to the entropy rate of the source.
Typical set decoding
In
information theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
, typical set decoding is used in conjunction with
random coding to estimate the transmitted message as the one with a codeword that is jointly ε-typical with the observation. i.e.
:
where
are the message estimate, codeword of message
and the observation respectively.
is defined with respect to the joint distribution
where
is the transition probability that characterizes the channel statistics, and
is some input distribution used to generate the codewords in the random codebook.
Universal null-hypothesis testing
Universal channel code
See also
*
Asymptotic equipartition property
In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of data compression.
Roughly speaking, the t ...
*
Source coding theorem
*
Noisy-channel coding theorem
In information theory, the noisy-channel coding theorem (sometimes Shannon's theorem or Shannon's limit), establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data (dig ...
References
*
C. E. Shannon,
A Mathematical Theory of Communication, ''
Bell System Technical Journal
The ''Bell Labs Technical Journal'' is the in-house scientific journal for scientists of Nokia Bell Labs, published yearly by the IEEE society. The managing editor is Charles Bahr.
The journal was originally established as the ''Bell System Techn ...
'', vol. 27, pp. 379–423, 623-656, July, October, 1948
*
*
David J. C. MacKay.
Information Theory, Inference, and Learning Algorithms' Cambridge: Cambridge University Press, 2003.
{{DEFAULTSORT:Typical Set
Information theory
Probability theory