In
information theory
Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...
, an entropy coding (or entropy encoding) is any
lossless data compression
Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits Redundanc ...
method that attempts to approach the lower bound declared by
Shannon's source coding theorem, which states that any lossless data compression method must have an expected code length greater than or equal to the entropy of the source.
More precisely, the source coding theorem states that for any source distribution, the expected code length satisfies
Huffman coding
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by ...
and
arithmetic coding
Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a String (computer science), string of characters is represented using a fixed number of bits per character, as in the American Standard Code for In ...
.
If the approximate entropy characteristics of a data stream are known in advance (especially for
signal compression Signal compression is the use of various techniques to increase the quality or quantity of signal parameters transmitted through a given telecommunications channel.
Types of signal compression include:
* Bandwidth compression
* Data compression
*D ...
), a simpler static code may be useful.
These static codes include
universal codes (such as
Elias gamma coding or
Fibonacci coding
In mathematics and computing, Fibonacci coding is a universal code which encodes positive integers into binary code words. It is one example of representations of integers based on Fibonacci numbers. Each code word ends with "11" and contains n ...
) and
Golomb codes (such as
unary coding or
Rice coding).
Since 2014, data compressors have started using the
asymmetric numeral systems family of entropy coding techniques, which allows combination of the compression ratio of
arithmetic coding
Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a String (computer science), string of characters is represented using a fixed number of bits per character, as in the American Standard Code for In ...
with a processing cost similar to
Huffman coding
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by ...
.
Entropy as a measure of similarity
Besides using entropy coding as a way to compress digital data, an entropy encoder can also be used to measure the amount of
similarity between
streams of data and already existing classes of data. This is done by generating an entropy coder/compressor for each class of data; unknown data is then
classified by feeding the uncompressed data to each compressor and seeing which compressor yields the highest compression. The coder with the best compression is probably the coder trained on the data that was most similar to the unknown data.
See also
*
Arithmetic coding
Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a String (computer science), string of characters is represented using a fixed number of bits per character, as in the American Standard Code for In ...
*
Asymmetric numeral systems (ANS)
*
Context-adaptive binary arithmetic coding (CABAC)
*
Huffman coding
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by ...
*
Range coding
References
External links
*
Information Theory, Inference, and Learning Algorithms', by
David MacKay (2003), gives an introduction to Shannon theory and data compression, including the
Huffman coding
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by ...
and
arithmetic coding
Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a String (computer science), string of characters is represented using a fixed number of bits per character, as in the American Standard Code for In ...
.
*
Source Coding'' by
T. Wiegand and H. Schwarz (2011).
{{Authority control
Entropy coding
Entropy and information
Data compression