In
information theory
Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...
, Shannon's source coding theorem (or noiseless coding theorem) establishes the statistical limits to possible
data compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
for data whose source is an
independent identically-distributed random variable, and the operational meaning of the
Shannon entropy.
Named after
Claude Shannon
Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was an American mathematician, electrical engineer, computer scientist, cryptographer and inventor known as the "father of information theory" and the man who laid the foundations of th ...
, the source coding theorem shows that, in the limit, as the length of a stream of
independent and identically-distributed random variable (i.i.d.) data tends to infinity, it is impossible to compress such data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without it being virtually certain that information will be lost. However it is possible to get the code rate arbitrarily close to the Shannon entropy, with negligible probability of loss.
The source coding theorem for symbol codes places an upper and a lower bound on the minimal possible expected length of codewords as a function of the
entropy
Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...
of the input word (which is viewed as a
random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a Mathematics, mathematical formalization of a quantity or object which depends on randomness, random events. The term 'random variable' in its mathema ...
) and of the size of the target alphabet.
Note that, for data that exhibits more dependencies (whose source is not an i.i.d. random variable), the
Kolmogorov complexity, which quantifies the minimal description length of an object, is more suitable to describe the limits of data compression. Shannon entropy takes into account only frequency regularities while Kolmogorov complexity takes into account all algorithmic regularities, so in general the latter is smaller. On the other hand, if an object is generated by a random process in such a way that it has only frequency regularities, entropy is close to complexity with high probability (Shen et al. 2017).
Statements
''Source coding'' is a mapping from (a sequence of) symbols from an information
source to a sequence of alphabet symbols (usually bits) such that the source symbols can be exactly recovered from the binary bits (lossless source coding) or recovered within some distortion (lossy source coding). This is one approach to
data compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
.
Source coding theorem
In information theory, the source coding theorem (Shannon 1948)
informally states that (MacKay 2003, pg. 81,
Cover 2006, Chapter 5
):
i.i.d. random variables each with entropy can be compressed into more than bits with negligible risk of information loss, as ; but conversely, if they are compressed into fewer than bits it is virtually certain that information will be lost.
The
coded sequence represents the compressed message in a biunivocal way, under the assumption that the decoder knows the source. From a practical point of view, this hypothesis is not always true. Consequently, when the entropy encoding is applied the transmitted message is
. Usually, the information that characterizes the source is inserted at the beginning of the transmitted message.
Source coding theorem for symbol codes
Let denote two finite alphabets and let and denote the
set of all finite words from those alphabets (respectively).
Suppose that is a random variable taking values in and let be a
uniquely decodable code from to where . Let denote the random variable given by the length of codeword .
If is optimal in the sense that it has the minimal expected word length for , then (Shannon 1948):
:
Where
denotes the
expected value
In probability theory, the expected value (also called expectation, expectancy, expectation operator, mathematical expectation, mean, expectation value, or first Moment (mathematics), moment) is a generalization of the weighted average. Informa ...
operator.
Proof: source coding theorem
Given is an
i.i.d. source, its
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
is i.i.d. with
entropy
Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...
in the discrete-valued case and
differential entropy in the continuous-valued case. The Source coding theorem states that for any , i.e. for any
rate larger than the
entropy
Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...
of the source, there is large enough and an encoder that takes i.i.d. repetition of the source, , and maps it to binary bits such that the source symbols are recoverable from the binary bits with probability of at least .
Proof of Achievability. Fix some , and let
:
The
typical set, , is defined as follows:
:
The
asymptotic equipartition property (AEP) shows that for large enough , the probability that a sequence generated by the source lies in the typical set, , as defined approaches one. In particular, for sufficiently large ,
can be made arbitrarily close to 1, and specifically, greater than
(See
AEP for a proof).
The definition of typical sets implies that those sequences that lie in the typical set satisfy:
:
*The probability of a sequence
being drawn from is greater than .
*
, which follows from the left hand side (lower bound) for
.
*
, which follows from upper bound for
and the lower bound on the total probability of the whole set .
Since
bits are enough to point to any string in this set.
The encoding algorithm: the encoder checks if the input sequence lies within the typical set; if yes, it outputs the index of the input sequence within the typical set; if not, the encoder outputs an arbitrary digit number. As long as the input sequence lies within the typical set (with probability at least ), the encoder does not make any error. So, the probability of error of the encoder is bounded above by .
''Proof of converse'': the converse is proved by showing that any set of size smaller than (in the sense of exponent) would cover a set of probability bounded away from .
Proof: Source coding theorem for symbol codes
For let denote the word length of each possible . Define
, where is chosen so that . Then
:
where the second line follows from
Gibbs' inequality and the fifth line follows from
Kraft's inequality:
:
so .
For the second inequality we may set
:
so that
:
and so
:
and
:
and so by Kraft's inequality there exists a prefix-free code having those word lengths. Thus the minimal satisfies
:
Extension to non-stationary independent sources
Fixed rate lossless source coding for discrete time non-stationary independent sources
Define typical set as:
:
Then, for given , for large enough, . Now we just encode the sequences in the typical set, and usual methods in source coding show that the cardinality of this set is smaller than
. Thus, on an average, bits suffice for encoding with probability greater than , where and can be made arbitrarily small, by making larger.
See also
*
Channel coding
In computing, telecommunication, information theory, and coding theory, forward error correction (FEC) or channel coding is a technique used for error control, controlling errors in data transmission over unreliable or noisy communication channel ...
*
Error exponent
*
Noisy-channel coding theorem
In information theory, the noisy-channel coding theorem (sometimes Shannon's theorem or Shannon's limit), establishes that for any given degree of noise contamination of a communication channel, it is possible (in theory) to communicate discrete ...
References
{{Reflist, refs=
[{{cite book , author = Shen, A. and Uspensky, V.A. and Vereshchagin, N. , title = Kolmogorov Complexity and Algorithmic Randomness , chapter = Chapter 7.3. : Complexity and entropy , year = 2017 , publisher = American Mathematical Society , isbn = 9781470431822 , url=https://books.google.com/books?id=IPNOvQEACAAJ , pages=226]
[ C.E. Shannon,]
A Mathematical Theory of Communication
{{Webarchive, url=https://web.archive.org/web/20090216231139/http://plan9.bell-labs.com//cm//ms//what//shannonday//shannon1948.pdf , date=2009-02-16 ", '' Bell System Technical Journal'', vol. 27, pp. 379–423, 623-656, July, October, 1948
[David J. C. MacKay. ]
Information Theory, Inference, and Learning Algorithms
' Cambridge: Cambridge University Press, 2003. {{ISBN, 0-521-64298-1
[{{cite book , last = Cover , first = Thomas M. , title = Elements of Information Theory , chapter = Chapter 5: Data Compression , year = 2006 , publisher = John Wiley & Sons , isbn = 0-471-24195-4, url=https://books.google.com/books?id=VWq5GG6ycxMC, pages=103–142]
Information theory
Coding theory
Data compression
Presentation layer protocols
Mathematical theorems in theoretical computer science
Articles containing proofs