Entropy encoding
   HOME

TheInfoList



OR:

In
information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
, an entropy coding (or entropy encoding) is any lossless data compression method that attempts to approach the lower bound declared by Shannon's source coding theorem, which states that any lossless data compression method must have expected code length greater or equal to the entropy of the source. More precisely, the source coding theorem states that for any source distribution, the expected code length satisfies \mathbb E_
(d(x)) D, or d, is the fourth Letter (alphabet), letter in the Latin alphabet, used in the English alphabet, modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is English alphabet#Le ...
\geq \mathbb E_ \log_b(P(x))/math>, where l is the number of symbols in a code word, d is the coding function, b is the number of symbols used to make output codes and P is the probability of the source symbol. An entropy coding attempts to approach this lower bound. Two of the most common entropy coding techniques are Huffman coding and arithmetic coding. If the approximate entropy characteristics of a data stream are known in advance (especially for signal compression), a simpler static code may be useful. These static codes include universal codes (such as Elias gamma coding or Fibonacci coding) and Golomb codes (such as unary coding or Rice coding). Since 2014, data compressors have started using the asymmetric numeral systems family of entropy coding techniques, which allows combination of the compression ratio of arithmetic coding with a processing cost similar to Huffman coding.


Entropy as a measure of similarity

Besides using entropy coding as a way to compress digital data, an entropy encoder can also be used to measure the amount of similarity between streams of data and already existing classes of data. This is done by generating an entropy coder/compressor for each class of data; unknown data is then
classified Classified may refer to: General *Classified information, material that a government body deems to be sensitive *Classified advertising or "classifieds" Music *Classified (rapper) (born 1977), Canadian rapper * The Classified, a 1980s American ro ...
by feeding the uncompressed data to each compressor and seeing which compressor yields the highest compression. The coder with the best compression is probably the coder trained on the data that was most similar to the unknown data.


See also

* Arithmetic coding * Asymmetric numeral systems (ANS) * Context-adaptive binary arithmetic coding (CABAC) * Huffman coding * Range coding


References


External links

*
Information Theory, Inference, and Learning Algorithms
', by David MacKay (2003), gives an introduction to Shannon theory and data compression, including the Huffman coding and arithmetic coding. *
Source Coding
'' by T. Wiegand and H. Schwarz (2011). {{Compression Methods Lossless compression algorithms Entropy and information