Chain Rule For Kolmogorov Complexity
   HOME

TheInfoList



OR:

The chain rule for
Kolmogorov complexity In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that prod ...
is an analogue of the chain rule for
information entropy In information theory, the entropy of a random variable quantifies the average level of uncertainty or information associated with the variable's potential states or possible outcomes. This measures the expected amount of information needed ...
, which states: : H(X,Y) = H(X) + H(Y, X) That is, the combined
randomness In common usage, randomness is the apparent or actual lack of definite pattern or predictability in information. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. ...
of two sequences ''X'' and ''Y'' is the sum of the randomness of ''X'' plus whatever randomness is left in ''Y'' once we know ''X''. This follows immediately from the definitions of conditional and joint entropy, and the fact from
probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...
that the
joint probability A joint or articulation (or articular surface) is the connection made between bones, ossicles, or other hard structures in the body which link an animal's skeletal system into a functional whole.Saladin, Ken. Anatomy & Physiology. 7th ed. McGra ...
is the product of the
marginal Marginal may refer to: * Marginal (album), ''Marginal'' (album), the third album of the Belgian rock band Dead Man Ray, released in 2001 * Marginal (manga), ''Marginal'' (manga) * ''El Marginal'', Argentine TV series * Marginal seat or marginal c ...
and
conditional probability In probability theory, conditional probability is a measure of the probability of an Event (probability theory), event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This ...
: : P(X,Y) = P(X) P(Y, X) : \Rightarrow \log P(X,Y) = \log P(X) + \log P(Y, X) The equivalent statement for Kolmogorov complexity does not hold exactly; it is true only up to a
logarithm In mathematics, the logarithm of a number is the exponent by which another fixed value, the base, must be raised to produce that number. For example, the logarithm of to base is , because is to the rd power: . More generally, if , the ...
ic term: : K(x,y) = K(x) + K(y, x) + O(\log(K(x,y))) (An exact version, , holds for the prefix complexity ''KP'', where is a shortest program for ''x''.) It states that the shortest program printing ''X'' and ''Y'' is obtained by concatenating a shortest program printing ''X'' with a program printing ''Y'' given ''X'', plus at most a logarithmic factor. The results implies that algorithmic mutual information, an analogue of mutual information for Kolmogorov complexity is symmetric: for all ''x,y''.


Proof

The ≤ direction is obvious: we can write a program to produce ''x'' and ''y'' by concatenating a program to produce ''x'', a program to produce ''y'' given access to ''x'', and (whence the log term) the length of one of the programs, so that we know where to separate the two programs for ''x'' and upper-bounds this length). For the ≥ direction, it suffices to show that for all such that we have that either :K(x, k,l) \le k + O(1) or :K(y, x,k,l) \le l + O(1). Consider the list (''a''1,''b''1), (''a''2,''b''2), ..., (''ae,be'') of all pairs produced by programs of length exactly ence Note that this list * contains the pair , * can be enumerated given and (by running all programs of length in parallel), * has at most 2''K''(''x'',''y'') elements (because there are at most 2''n'' programs of length ). First, suppose that ''x'' appears less than times as first element. We can specify ''y'' given by enumerating (''a''1,''b''1), (''a''2,''b''2), ... and then selecting in the sub-list of pairs . By assumption, the index of in this sub-list is less than and hence, there is a program for ''y'' given of length . Now, suppose that ''x'' appears at least times as first element. This can happen for at most different strings. These strings can be enumerated given and hence ''x'' can be specified by its index in this enumeration. The corresponding program for ''x'' has size . Theorem proved.


References

* * * {{cite journal , last1=Zvonkin , first1=A K , last2=Levin , first2=L A , title=The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms , journal=Russian Mathematical Surveys , publisher=IOP Publishing , volume=25 , issue=6 , date=1970-12-31 , issn=0036-0279 , doi=10.1070/rm1970v025n06abeh001269 , pages=83–124, bibcode=1970RuMaS..25...83Z , s2cid=250850390 * * Computability theory Theory of computation Articles containing proofs