information theory Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...

, the error exponent of a

channel code Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and data storage. Codes are studied ...

source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the ...

over the block length of the code is the rate at which the error probability decays exponentially with the block length of the code. Formally, it is defined as the limiting ratio of the negative logarithm of the error probability to the block length of the code for large block lengths. For example, if the probability of error

P_

of a decoder drops as

e^

, where

n

is the block length, the error exponent is

\alpha

. In this example,

\frac

approaches

\alpha

for large

n

. Many of the

information-theoretic Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...

theorems are of asymptotic nature, for example, the channel coding theorem states that for any rate less than the channel capacity, the probability of the error of the channel code can be made to go to zero as the block length goes to infinity. In practical situations, there are limitations to the delay of the communication and the block length must be finite. Therefore, it is important to study how the probability of error drops as the block length go to infinity.

Error exponent in channel coding

For time-invariant DMC's

The channel coding theorem states that for any ε > 0 and for any rate less than the channel capacity, there is an encoding and decoding scheme that can be used to ensure that the probability of block error is less than ε > 0 for sufficiently long message block ''X''. Also, for any rate greater than the channel capacity, the probability of block error at the receiver goes to one as the block length goes to infinity. Assuming a channel coding setup as follows: the channel can transmit any of

M = 2^ \;

messages, by transmitting the corresponding codeword (which is of length ''n''). Each component in the codebook is drawn

i.i.d. In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is us ...

according to some probability distribution with

probability mass function In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete density function. The probability mass ...

''Q''. At the decoding end, maximum likelihood decoding is done. Let

X_i^n

be the

i

th random codeword in the codebook, where

i

goes from

1

M

. Suppose the first message is selected, so codeword

X_1^n

is transmitted. Given that

y_1^n

is received, the probability that the codeword is incorrectly detected as

X_2^n

is: :

P_ = \sum_ Q(x_2^n)1(p(y_1^n\mid x_2^n) > p(y_1^n \mid x_1^n)).

The function

1(p(y_1^n\mid x_2^n)>p(y_1^n\mid x_1^n))

has upper bound :

\left(\frac\right)^s

for

s > 0 \;

Thus, :

P_ \le \sum_ Q(x_2^n) \left(\frac\right)^s.

Since there are a total of ''M'' messages, and the entries in the codebook are i.i.d., the probability that

X_1^n

is confused with any other message is

M

times the above expression. Using the union bound, the probability of confusing

X_1^n

with any message is bounded by: :

P_ \le M^\rho \left( \sum_ Q(x_2^n) \left(\frac\right)^\right)^

for any

0<\rho<1

. Averaging over all combinations of

x_1^n, y_1^n

: :

s\right)^.

Choosing

s = 1-s\rho

and combining the two sums over

x_1^n

in the above formula: :

\right)^.

Using the independence nature of the elements of the codeword, and the discrete memoryless nature of the channel: :

\right)^

Using the fact that each element of codeword is identically distributed and thus stationary: :

\right)^\right)^n.

Replacing ''M'' by 2^''nR'' and defining :

\right)^\right),

probability of error becomes :

P_\mathrm \le \exp(-n(E_o(\rho,Q) - \rho R)).

''Q'' and

\rho

should be chosen so that the bound is tighest. Thus, the error exponent can be defined as :

E_(R) = \max_Q \max_ E_(\rho,Q) - \rho R. \;

Error exponent in source coding

For time invariant discrete memoryless sources

The

source coding In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...

theorem states that for any

\varepsilon > 0

and any discrete-time i.i.d. source such as

X

and for any rate less than the

entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...

of the source, there is large enough

n

and an encoder that takes

n

i.i.d. repetition of the source,

X^

, and maps it to

n.(H(X)+\varepsilon)

binary bits such that the source symbols

X^

are recoverable from the binary bits with probability at least

1 - \varepsilon

. Let

M=e^\,\!

be the total number of possible messages. Next map each of the possible source output sequences to one of the messages randomly using a uniform distribution and independently from everything else. When a source is generated the corresponding message

M=m \,

is then transmitted to the destination. The message gets decoded to one of the possible source strings. In order to minimize the probability of error the decoder will decode to the source sequence

X_1^n

that maximizes

P(X_1^n\mid A_m)

, where

A_m\,

denotes the event that message

m

was transmitted. This rule is equivalent to finding the source sequence

X_1^n

among the set of source sequences that map to message

m

that maximizes

P(X_1^n)

. This reduction follows from the fact that the messages were assigned randomly and independently of everything else. Thus, as an example of when an error occurs, supposed that the source sequence

X_1^n(1)

was mapped to message

1

as was the source sequence

X_1^n(2)

. If

X_1^n(1) \,

was generated at the source, but

P(X_1^n(2)) > P(X_1^n(1))

then an error occurs. Let

S_i \,

denote the event that the source sequence

X_1^n(i)

was generated at the source, so that

P(S_i) = P(X_1^n(i)) \,.

Then the probability of error can be broken down as

P(E) = \sum_i P(E\mid S_i) P(S_i)\, .

Thus, attention can be focused on finding an upper bound to the

P(E\mid S_i)\,

. Let

A_ \,

denote the event that the source sequence

X_1^n(i')

was mapped to the same message as the source sequence

X_1^n(i)

and that

P(X_1^n(i')) \geq P(X_1^n(i))

. Thus, letting

X_\,

denote the event that the two source sequences

i \,

and

i' \,

map to the same message, we have that :

P(A_) = P\left(X_ \bigcap P(X_1^n(i')\right) \geq P(X_1^n(i)))\,

and using the fact that

P(X_) = \frac\,

and is independent of everything else have that :

P(A_) = \frac P(P(X_1^n(i')) \geq P(X_1^n(i)))\, .

A simple upper bound for the term on the left can be established as :

\leq \left ( \frac \right ) ^s \,

for some arbitrary real number

s>0 \,.

This upper bound can be verified by noting that

P(P(X_1^n(i')) > P(X_1^n(i))) \,

either equals

1 \,

0 \,

because the probabilities of a given input sequence are completely deterministic. Thus, if

P(X_1^n(i')) \geq P(X_1^n(i)) \, ,

then

\frac \geq 1 \,

so that the inequality holds in that case. The inequality holds in the other case as well because :

\left ( \frac \right ) ^s \geq 0 \,

for all possible source strings. Thus, combining everything and introducing some

\rho \in,1 \,

, have that :

P(E\mid S_i) \leq P(\bigcup_ A_) \leq \left ( \sum_ P(A_) \right ) ^ \rho \leq \left ( \frac \sum_ \left ( \frac \right ) ^s \right ) ^ \rho \, .

Where the inequalities follow from a variation on the Union Bound. Finally applying this upper bound to the summation for

P(E) \,

have that: :

P(E) = \sum_i P(E\mid S_i) P(S_i) \leq \sum_i P(X_1^n(i)) \left ( \frac \sum_ \left ( \frac \right ) ^s \right ) ^ \rho \, .

Where the sum can now be taken over all

i' \,

because that will only increase the bound. Ultimately yielding that :

P(E) \leq \frac \sum_i P(X_1^n(i))^ \left ( \sum_ P(X_1^n(i'))^s \right ) ^\rho \, .

Now for simplicity let

1-s \rho = s \,

so that

s= \frac \, .

Substituting this new value of

s \,

into the above bound on the probability of error and using the fact that

i' \,

is just a dummy variable in the sum gives the following as an upper bound on the probability of error: :

P(E) \leq \frac \left ( \sum_i P(X_1^n(i))^ \right ) ^   \, .

M=e^\,\!

and each of the components of

X_1^n(i) \,

are independent. Thus, simplifying the above equation yields :

\right).

The term in the exponent should be maximized over

\rho \,

in order to achieve the highest upper bound on the probability of error. Letting

E_0( \rho) = \ln \left( \sum_ P(x_i)^ \right)(1+\rho) \, ,

see that the error exponent for the source coding case is: :

E_r(R) = \max_ \left \rho R - E_0(\rho) \right \,

References

R. Gallager, ''Information Theory and Reliable Communication'', Wiley 1968 {{DEFAULTSORT:Error Exponent Information theory Data compression

Error exponent in channel coding

For time-invariant DMC's

Error exponent in source coding

For time invariant discrete memoryless sources

See also

References