Shannon–Fano–Elias coding
   HOME

TheInfoList



OR:

In information theory, Shannon–Fano–Elias coding is a precursor to
arithmetic coding Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a string of characters is represented using a fixed number of bits per character, as in the ASCII code. When a string is converted to arithmetic ...
, in which probabilities are used to determine codewords.


Algorithm description

Given a
discrete random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
''X'' of ordered values to be encoded, let p(x) be the probability for any ''x'' in ''X''. Define a function :\bar F(x) = \sum_p(x_i) + \frac 12 p(x) Algorithm: :For each ''x'' in ''X'', ::Let ''Z'' be the binary expansion of \bar F(x). ::Choose the length of the encoding of ''x'', L(x), to be the integer \left\lceil \log_2 \frac \right\rceil + 1 ::Choose the encoding of ''x'', code(x), be the first L(x)
most significant bit In computing, bit numbering is the convention used to identify the bit positions in a binary number. Bit significance and indexing In computing, the least significant bit (LSB) is the bit position in a binary integer representing the binar ...
s after the decimal point of ''Z''.


Example

Let ''X'' = , with probabilities ''p'' = . :For ''A'' ::\bar F(A) = \frac 12 p(A) = \frac 12 \cdot \frac 13 = 0.1666\ldots ::In binary, ''Z''(''A'') = 0.0010101010... :: L(A) = \left\lceil \log_2 \frac 1 \frac 1 3 \right\rceil + 1 = \mathbf 3 ::code(''A'') is 001 :For ''B'' ::\bar F(B) = p(A) + \frac 12 p(B) = \frac 13 + \frac 12 \cdot \frac 14 = 0.4583333\ldots ::In binary, ''Z''(''B'') = 0.01110101010101... :: L(B) = \left\lceil \log_2 \frac 1 \frac 1 4 \right\rceil + 1 = \mathbf 3 ::code(''B'') is 011 :For ''C'' ::\bar F(C) = p(A) + p(B) + \frac 12 p(C) = \frac 13 + \frac 14 + \frac 12 \cdot \frac 16 = 0.66666\ldots ::In binary, ''Z''(''C'') = 0.101010101010... :: L(C) = \left\lceil \log_2 \frac 1 \frac 1 6 \right\rceil + 1 = \mathbf 4 ::code(''C'') is 1010 :For ''D'' ::\bar F(D) = p(A) + p(B) + p(C) + \frac 12 p(D) = \frac 13 + \frac 14 + \frac 16 + \frac 12 \cdot \frac 14 = 0.875 ::In binary, Z(D) = 0.111 ::L(D) = \left\lceil \log_2 \frac 1 \frac 1 4 \right\rceil + 1 = \mathbf 3 ::code(''D'') is 111


Algorithm analysis


Prefix code

Shannon–Fano–Elias coding produces a binary
prefix code A prefix code is a type of code system distinguished by its possession of the "prefix property", which requires that there is no whole code word in the system that is a prefix (initial segment) of any other code word in the system. It is trivially t ...
, allowing for direct decoding. Let bcode(''x'') be the rational number formed by adding a decimal point before a binary code. For example, if code(''C'') = 1010 then bcode(''C'') = 0.1010. For all ''x'', if no ''y'' exists such that : \operatorname(x) \le \operatorname(y) < \operatorname(x) + 2^ then all the codes form a prefix code. By comparing ''F'' to the CDF of ''X'', this property may be demonstrated graphically for Shannon–Fano–Elias coding. By definition of ''L'' it follows that : 2^ \le \frac 12 p(x) And because the bits after ''L''(''y'') are truncated from ''F''(''y'') to form code(''y''), it follows that : \bar F(y) - \operatorname(y) \le 2^ thus bcode(''y'') must be no less than CDF(''x''). So the above graph demonstrates that the \operatorname(y) - \operatorname(x) > p(x) \ge 2^, therefore the prefix property holds.


Code length

The average code length is LC(X) = \sum_p(x)L(x) = \sum_p(x) \left(\left\lceil \log_2 \frac \right\rceil + 1\right).
Thus for ''H''(''X''), the
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
of the random variable ''X'', :H(X) + 1 \le LC(X) < H(X) + 2 Shannon Fano Elias codes from 1 to 2 extra bits per symbol from ''X'' than entropy, so the code is not used in practice.


References

{{DEFAULTSORT:Shannon-Fano-Elias coding Lossless compression algorithms