The data processing inequality is an information theoretic concept which states that the information content of a signal cannot be increased via a local physical operation. This can be expressed concisely as 'post-processing cannot increase information'.

Definition

Let three random variables form the

Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...

X \rightarrow Y \rightarrow Z

, implying that the conditional distribution of

Z

depends only on

Y

and is

conditionally independent In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probabil ...

X

. Specifically, we have such a Markov chain if the joint probability mass function can be written as :

p(x,y,z) = p(x)p(y, x)p(z, y)=p(y)p(x, y)p(z, y)

In this setting, no processing of

Y

, deterministic or random, can increase the information that

Y

contains about

X

. Using the

mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the " amount of information" (in units such as ...

, this can be written as : :

I(X;Y) \geqslant I(X;Z)

With the equality

I(X;Y) = I(X;Z)

if and only if

I(X;Y\mid Z)=0

, i.e.

Z

and

Y

contain the same information about

X

, and

X \rightarrow Z \rightarrow Y

also forms a Markov chain.

Proof

One can apply the chain rule for mutual information to obtain two different decompositions of

I(X;Y,Z)

: :

I(X;Z) + I(X;Y\mid Z) = I(X;Y,Z) = I(X;Y) + I(X;Z\mid Y)

By the relationship

X \rightarrow Y \rightarrow Z

, we know that

X

and

Z

are conditionally independent, given

Y

, which means the conditional mutual information,

I(X;Z\mid Y)=0

. The data processing inequality then follows from the non-negativity of

I(X;Y\mid Z)\ge0

References

External links

*http://www.scholarpedia.org/article/Mutual_information Data processing {{Comp-sci-stub

Definition

Proof

See also

References

External links