HOME

TheInfoList



OR:

In
algorithmic information theory Algorithmic information theory (AIT) is a branch of theoretical computer science that concerns itself with the relationship between computation and information of computably generated objects (as opposed to stochastically generated), such as str ...
(a subfield of
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
and
mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest
computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. Computer programs are one component of software, which also includes software documentation, documentation and oth ...
(in a predetermined
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
) that produces the object as output. It is a measure of the
computation Computation is any type of arithmetic or non-arithmetic calculation that follows a well-defined model (e.g., an algorithm). Mechanical or electronic devices (or, historically, people) that perform computations are known as ''computers''. An esp ...
al resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after
Andrey Kolmogorov Andrey Nikolaevich Kolmogorov ( rus, Андре́й Никола́евич Колмого́ров, p=ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf, a=Ru-Andrey Nikolaevich Kolmogorov.ogg, 25 April 1903 – 20 October 1987) was a Sovi ...
, who first published on the subject in 1963 and is a generalization of classical information theory. The notion of Kolmogorov complexity can be used to state and prove impossibility results akin to
Cantor's diagonal argument In set theory, Cantor's diagonal argument, also called the diagonalisation argument, the diagonal slash argument, the anti-diagonal argument, the diagonal method, and Cantor's diagonalization proof, was published in 1891 by Georg Cantor as a m ...
, Gödel's incompleteness theorem, and Turing's halting problem. In particular, no program ''P'' computing a
lower bound In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is greater than or equal to every element of . Dually, a lower bound or minorant of is defined to be an elemen ...
for each text's Kolmogorov complexity can return a value essentially larger than ''P'''s own length (see section ); hence no single program can compute the exact Kolmogorov complexity for infinitely many texts.


Definition

Consider the following two strings of 32 lowercase letters and digits: : abababababababababababababababab , and : 4c1j5b2p0cv4w1x8rx2y39umgw5q85s7 The first string has a short English-language description, namely "write ab 16 times", which consists of 17 characters. The second one has no obvious simple description (using the same character set) other than writing down the string itself, i.e., "write 4c1j5b2p0cv4w1x8rx2y39umgw5q85s7" which has 38 characters. Hence the operation of writing the first string can be said to have "less complexity" than writing the second. More formally, the
complexity Complexity characterises the behaviour of a system or model whose components interact in multiple ways and follow local rules, leading to nonlinearity, randomness, collective dynamics, hierarchy, and emergence. The term is generally used to ch ...
of a string is the length of the shortest possible description of the string in some fixed
universal Universal is the adjective for universe. Universal may also refer to: Companies * NBCUniversal, a media and entertainment company ** Universal Animation Studios, an American Animation studio, and a subsidiary of NBCUniversal ** Universal TV, a t ...
description language (the sensitivity of complexity relative to the choice of description language is discussed below). It can be shown that the Kolmogorov complexity of any string cannot be more than a few bytes larger than the length of the string itself. Strings like the ''abab'' example above, whose Kolmogorov complexity is small relative to the string's size, are not considered to be complex. The Kolmogorov complexity can be defined for any mathematical object, but for simplicity the scope of this article is restricted to strings. We must first specify a description language for strings. Such a description language can be based on any computer programming language, such as
Lisp A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lispin ...
,
Pascal Pascal, Pascal's or PASCAL may refer to: People and fictional characters * Pascal (given name), including a list of people with the name * Pascal (surname), including a list of people and fictional characters with the name ** Blaise Pascal, Frenc ...
, or
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
. If P is a program which outputs a string ''x'', then P is a description of ''x''. The length of the description is just the length of P as a character string, multiplied by the number of bits in a character (e.g., 7 for
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
). We could, alternatively, choose an encoding for
Turing machine A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer alg ...
s, where an ''encoding'' is a function which associates to each Turing Machine M a bitstring . If M is a Turing Machine which, on input ''w'', outputs string ''x'', then the concatenated string ''w'' is a description of ''x''. For theoretical analysis, this approach is more suited for constructing detailed formal proofs and is generally preferred in the research literature. In this article, an informal approach is discussed. Any string ''s'' has at least one description. For example, the second string above is output by the pseudo-code: function GenerateString2() return "4c1j5b2p0cv4w1x8rx2y39umgw5q85s7" whereas the first string is output by the (much shorter) pseudo-code: function GenerateString1() return "ab" × 16 If a description ''d''(''s'') of a string ''s'' is of minimal length (i.e., using the fewest bits), it is called a minimal description of ''s'', and the length of ''d''(''s'') (i.e. the number of bits in the minimal description) is the Kolmogorov complexity of ''s'', written ''K''(''s''). Symbolically, :''K''(''s'') = , ''d''(''s''), . The length of the shortest description will depend on the choice of description language; but the effect of changing languages is bounded (a result called the ''invariance theorem'').


Invariance theorem


Informal treatment

There are some description languages which are optimal, in the following sense: given any description of an object in a description language, said description may be used in the optimal description language with a constant overhead. The constant depends only on the languages involved, not on the description of the object, nor the object being described. Here is an example of an optimal description language. A description will have two parts: * The first part describes another description language. * The second part is a description of the object in that language. In more technical terms, the first part of a description is a computer program (specifically: a compiler for the object's language, written in the description language), with the second part being the input to that computer program which produces the object as output. The invariance theorem follows: Given any description language ''L'', the optimal description language is at least as efficient as ''L'', with some constant overhead. Proof: Any description ''D'' in ''L'' can be converted into a description in the optimal language by first describing ''L'' as a computer program ''P'' (part 1), and then using the original description ''D'' as input to that program (part 2). The total length of this new description ''D′'' is (approximately): :, ''D′'' , = , ''P'', + , ''D'', The length of ''P'' is a constant that doesn't depend on ''D''. So, there is at most a constant overhead, regardless of the object described. Therefore, the optimal language is universal
up to Two mathematical objects ''a'' and ''b'' are called equal up to an equivalence relation ''R'' * if ''a'' and ''b'' are related by ''R'', that is, * if ''aRb'' holds, that is, * if the equivalence classes of ''a'' and ''b'' with respect to ''R'' ...
this additive constant.


A more formal treatment

Theorem: If ''K''1 and ''K''2 are the complexity functions relative to
Turing complete Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical ...
description languages ''L''1 and ''L''2, then there is a constant ''c'' – which depends only on the languages ''L''1 and ''L''2 chosen – such that :∀''s''. −''c'' ≤ ''K''1(''s'') − ''K''2(''s'') ≤ ''c''. Proof: By symmetry, it suffices to prove that there is some constant ''c'' such that for all strings ''s'' :''K''1(''s'') ≤ ''K''2(''s'') + ''c''. Now, suppose there is a program in the language ''L''1 which acts as an interpreter for ''L''2: function InterpretLanguage(string ''p'') where ''p'' is a program in ''L''2. The interpreter is characterized by the following property: : Running InterpretLanguage on input ''p'' returns the result of running ''p''. Thus, if P is a program in ''L''2 which is a minimal description of ''s'', then InterpretLanguage(P) returns the string ''s''. The length of this description of ''s'' is the sum of # The length of the program InterpretLanguage, which we can take to be the constant ''c''. # The length of P which by definition is ''K''2(''s''). This proves the desired upper bound.


History and context

Algorithmic information theory Algorithmic information theory (AIT) is a branch of theoretical computer science that concerns itself with the relationship between computation and information of computably generated objects (as opposed to stochastically generated), such as str ...
is the area of computer science that studies Kolmogorov complexity and other complexity measures on strings (or other
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, ...
s). The concept and theory of Kolmogorov Complexity is based on a crucial theorem first discovered by
Ray Solomonoff Ray Solomonoff (July 25, 1926 – December 7, 2009) was the inventor of algorithmic probability, his General Theory of Inductive Inference (also known as Universal Inductive Inference),Samuel Rathmanner and Marcus Hutter. A philosophical treatise ...
, who published it in 1960, describing it in "A Preliminary Report on a General Theory of Inductive Inference" as part of his invention of
algorithmic probability In algorithmic information theory, algorithmic probability, also known as Solomonoff probability, is a mathematical method of assigning a prior probability to a given observation. It was invented by Ray Solomonoff in the 1960s. It is used in induc ...
. He gave a more complete description in his 1964 publications, "A Formal Theory of Inductive Inference," Part 1 and Part 2 in ''Information and Control''. Andrey Kolmogorov later independently published this theorem in ''Problems Inform. Transmission'' in 1965.
Gregory Chaitin Gregory John Chaitin ( ; born 25 June 1947) is an Argentine-American mathematician and computer scientist. Beginning in the late 1960s, Chaitin made contributions to algorithmic information theory and metamathematics, in particular a computer-t ...
also presents this theorem in ''J. ACM'' – Chaitin's paper was submitted October 1966 and revised in December 1968, and cites both Solomonoff's and Kolmogorov's papers. The theorem says that, among algorithms that decode strings from their descriptions (codes), there exists an optimal one. This algorithm, for all strings, allows codes as short as allowed by any other algorithm up to an additive constant that depends on the algorithms, but not on the strings themselves. Solomonoff used this algorithm and the code lengths it allows to define a "universal probability" of a string on which inductive inference of the subsequent digits of the string can be based. Kolmogorov used this theorem to define several functions of strings, including complexity, randomness, and information. When Kolmogorov became aware of Solomonoff's work, he acknowledged Solomonoff's priority. For several years, Solomonoff's work was better known in the Soviet Union than in the Western World. The general consensus in the scientific community, however, was to associate this type of complexity with Kolmogorov, who was concerned with randomness of a sequence, while Algorithmic Probability became associated with Solomonoff, who focused on prediction using his invention of the universal prior probability distribution. The broader area encompassing descriptional complexity and probability is often called Kolmogorov complexity. The computer scientist
Ming Li Ming Li is a Canadian computer scientist, known for his fundamental contributions to Kolmogorov complexity, bioinformatics, machine learning theory, and analysis of algorithms. Li is currently a professor of Computer Science at the David R. Cheri ...
considers this an example of the
Matthew effect The Matthew effect of accumulated advantage, Matthew principle, or Matthew effect, is the tendency of individuals to accrue social or economic success in proportion to their initial level of popularity, friends, wealth, etc. It is sometimes summar ...
: "...to everyone who has, more will be given..." There are several other variants of Kolmogorov complexity or algorithmic information. The most widely used one is based on self-delimiting programs, and is mainly due to
Leonid Levin Leonid Anatolievich Levin ( ; russian: Леони́д Анато́льевич Ле́вин; uk, Леоні́д Анато́лійович Ле́він; born November 2, 1948) is a Soviet-American mathematician and computer scientist. He is kn ...
(1974). An axiomatic approach to Kolmogorov complexity based on Blum axioms (Blum 1967) was introduced by Mark Burgin in the paper presented for publication by Andrey Kolmogorov.Burgin, M. (1982), "Generalized Kolmogorov complexity and duality in theory of computations", ''Notices of the Russian Academy of Sciences'', v.25, No. 3, pp. 19–23.


Basic results

In the following discussion, let ''K''(''s'') be the complexity of the string ''s''. It is not hard to see that the minimal description of a string cannot be too much larger than the string itself — the program GenerateString2 above that outputs ''s'' is a fixed amount larger than ''s''. Theorem: There is a constant ''c'' such that :∀''s''. ''K''(''s'') ≤ , ''s'', + ''c''.


Uncomputability of Kolmogorov complexity


A naive attempt at a program to compute ''K''

At first glance it might seem trivial to write a program which can compute ''K''(''s'') for any ''s'', such as the following: function KolmogorovComplexity(string s) for i = 1 to infinity: for each string p of length exactly i if isValidProgram(p) and evaluate(p)

s return i This program iterates through all possible programs (by iterating through all possible strings and only considering those which are valid programs), starting with the shortest. Each program is executed to find the result produced by that program, comparing it to the input ''s''. If the result matches then the length of the program is returned. However this will not work because some of the programs ''p'' tested will not terminate, e.g. if they contain infinite loops. There is no way to avoid all of these programs by testing them in some way before executing them due to the non-computability of the
halting problem In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. Alan Turing proved in 1936 that a ...
. What is more, no program at all can compute the function ''K'', be it ever so sophisticated. This is proven in the following.


Formal proof of uncomputability of ''K''

Theorem: There exist strings of arbitrarily large Kolmogorov complexity. Formally: for each natural number ''n'', there is a string ''s'' with ''K''(''s'') ≥ ''n''.However, an ''s'' with ''K''(''s'') = ''n'' need not exist for every ''n''. For example, if ''n'' is not a multiple of 7, no
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
program can have a length of exactly ''n'' bits.
Proof: Otherwise all of the infinitely many possible finite strings could be generated by the finitely manyThere are 1 + 2 + 22 + 23 + ... + 2''n'' = 2''n''+1 − 1 different program texts of length up to ''n'' bits; cf.
geometric series In mathematics, a geometric series is the sum of an infinite number of terms that have a constant ratio between successive terms. For example, the series :\frac \,+\, \frac \,+\, \frac \,+\, \frac \,+\, \cdots is geometric, because each suc ...
. If program lengths are to be multiples of 7 bits, even fewer program texts exist.
programs with a complexity below ''n'' bits. Theorem: ''K'' is not a
computable function Computable functions are the basic objects of study in computability theory. Computable functions are the formalized analogue of the intuitive notion of algorithms, in the sense that a function is computable if there exists an algorithm that can do ...
. In other words, there is no program which takes any string ''s'' as input and produces the integer ''K''(''s'') as output. The following proof by contradiction uses a simple
Pascal Pascal, Pascal's or PASCAL may refer to: People and fictional characters * Pascal (given name), including a list of people with the name * Pascal (surname), including a list of people and fictional characters with the name ** Blaise Pascal, Frenc ...
-like language to denote programs; for sake of proof simplicity assume its description (i.e. an interpreter) to have a length of bits. Assume for contradiction there is a program function KolmogorovComplexity(string s) which takes as input a string ''s'' and returns ''K''(''s''). All programs are of finite length so, for sake of proof simplicity, assume it to be bits. Now, consider the following program of length bits: function GenerateComplexString() for i = 1 to infinity: for each string s of length exactly i if KolmogorovComplexity(s) ≥ 8000000000 return s Using KolmogorovComplexity as a subroutine, the program tries every string, starting with the shortest, until it returns a string with Kolmogorov complexity at least bits,By the previous theorem, such a string exists, hence the for loop will eventually terminate. i.e. a string that cannot be produced by any program shorter than bits. However, the overall length of the above program that produced ''s'' is only bits,including the language interpreter and the subroutine code for KolmogorovComplexity which is a contradiction. (If the code of KolmogorovComplexity is shorter, the contradiction remains. If it is longer, the constant used in GenerateComplexString can always be changed appropriately.)If KolmogorovComplexity has length ''n'' bits, the constant ''m'' used in GenerateComplexString needs to be adapted to satisfy , which is always possible since ''m'' grows faster than log10(''m''). The above proof uses a contradiction similar to that of the
Berry paradox The Berry paradox is a self-referential paradox arising from an expression like "The smallest positive integer not definable in under sixty letters" (a phrase with fifty-seven letters). Bertrand Russell, the first to discuss the paradox in print, ...
: "The smallest positive integer that cannot be defined in fewer than twenty English words". It is also possible to show the non-computability of ''K'' by reduction from the non-computability of the halting problem ''H'', since ''K'' and ''H'' are
Turing-equivalent Turing equivalence may refer to: * As related to Turing completeness In computability theory, a system of data-manipulation rules (such as a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-compl ...
. There is a corollary, humorously called the " full employment theorem" in the programming language community, stating that there is no perfect size-optimizing compiler.


Chain rule for Kolmogorov complexity

The chain rule for Kolmogorov complexity states that :''K''(''X'',''Y'') ≤ ''K''(''X'') + ''K''(''Y'', ''X'') + ''O''(log(''K''(''X'',''Y''))). It states that the shortest program that reproduces ''X'' and ''Y'' is
no more No More may refer to: * No More (band), a German post-punk band Songs * "No More" (1944 song), written by Bob Russell and Toots Camarata; covered by Billie Holiday * "No More" (1961 song), a version of "La Paloma" recorded by Elvis Presley and ...
than a logarithmic term larger than a program to reproduce ''X'' and a program to reproduce ''Y'' given ''X''. Using this statement, one can define an analogue of mutual information for Kolmogorov complexity.


Compression

It is straightforward to compute upper bounds for ''K''(''s'') – simply compress the string ''s'' with some method, implement the corresponding decompressor in the chosen language, concatenate the decompressor to the compressed string, and measure the length of the resulting string – concretely, the size of a
self-extracting archive A self-extracting archive (SFX or SEA) is a computer executable program which contains compressed data in an archive file combined with machine-executable program instructions to extract this information on a compatible operating system and ...
in the given language. A string ''s'' is compressible by a number ''c'' if it has a description whose length does not exceed , ''s'', − ''c'' bits. This is equivalent to saying that ''K''(''s'') ≤ , ''s'', − ''c''. Otherwise, ''s'' is incompressible by ''c''. A string incompressible by 1 is said to be simply ''incompressible'' – by the
pigeonhole principle In mathematics, the pigeonhole principle states that if items are put into containers, with , then at least one container must contain more than one item. For example, if one has three gloves (and none is ambidextrous/reversible), then there mu ...
, which applies because every compressed string maps to only one uncompressed string, incompressible strings must exist, since there are 2''n'' bit strings of length ''n'', but only 2''n'' − 1 shorter strings, that is, strings of length less than ''n'', (i.e. with length 0, 1, ..., ''n − 1).As there are strings of length ''L'', the number of strings of lengths is = , which is a finite
geometric series In mathematics, a geometric series is the sum of an infinite number of terms that have a constant ratio between successive terms. For example, the series :\frac \,+\, \frac \,+\, \frac \,+\, \frac \,+\, \cdots is geometric, because each suc ...
with sum =
For the same reason, most strings are complex in the sense that they cannot be significantly compressed – their ''K''(''s'') is not much smaller than , ''s'', , the length of ''s'' in bits. To make this precise, fix a value of ''n''. There are 2''n'' bitstrings of length ''n''. The
uniform A uniform is a variety of clothing worn by members of an organization while participating in that organization's activity. Modern uniforms are most often worn by armed forces and paramilitary organizations such as police, emergency services, ...
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...
distribution on the space of these bitstrings assigns exactly equal weight 2−''n'' to each string of length ''n''. Theorem: With the uniform probability distribution on the space of bitstrings of length ''n'', the probability that a string is incompressible by ''c'' is at least 1 − 2−''c''+1 + 2−''n''. To prove the theorem, note that the number of descriptions of length not exceeding ''n'' − ''c'' is given by the geometric series: : 1 + 2 + 22 + ... + 2''n'' − ''c'' = 2''n''−''c''+1 − 1. There remain at least : 2''n'' − 2''n''−''c''+1 + 1 bitstrings of length ''n'' that are incompressible by ''c''. To determine the probability, divide by 2''n''.


Chaitin's incompleteness theorem

By the above theorem (), most strings are complex in the sense that they cannot be described in any significantly "compressed" way. However, it turns out that the fact that a specific string is complex cannot be formally proven, if the complexity of the string is above a certain threshold. The precise formalization is as follows. First, fix a particular
axiomatic system In mathematics and logic, an axiomatic system is any set of axioms from which some or all axioms can be used in conjunction to logically derive theorems. A theory is a consistent, relatively-self-contained body of knowledge which usually contains ...
S for the
natural number In mathematics, the natural numbers are those numbers used for counting (as in "there are ''six'' coins on the table") and ordering (as in "this is the ''third'' largest city in the country"). Numbers used for counting are called '' cardinal ...
s. The axiomatic system has to be powerful enough so that, to certain assertions A about complexity of strings, one can associate a formula FA in S. This association must have the following property: If FA is provable from the axioms of S, then the corresponding assertion A must be true. This "formalization" can be achieved based on a
Gödel numbering In mathematical logic, a Gödel numbering is a function that assigns to each symbol and well-formed formula of some formal language a unique natural number, called its Gödel number. The concept was developed by Kurt Gödel for the proof of h ...
. Theorem: There exists a constant ''L'' (which only depends on S and on the choice of description language) such that there does not exist a string ''s'' for which the statement :''K''(''s'') ≥ ''L''       (as formalized in S) can be proven within S. Proof Idea: The proof of this result is modeled on a self-referential construction used in Berry's paradox. We firstly obtain a program which enumerates the proofs within S and we specify a procedure ''P'' which takes as an input an integer ''L'' and prints the strings ''x'' which are within proofs within S of the statement ''K''(''x'') ≥ ''L''. By then setting ''L'' to greater than the length of this procedure ''P'', we have that the required length of a program to print ''x'' as stated in ''K''(''x'') ≥ ''L'' as being at least ''L'' is then less than the amount ''L'' since the string ''x'' was printed by the procedure ''P''. This is a contradiction. So it is not possible for the proof system S to prove ''K''(''x'') ≥ ''L'' for ''L'' arbitrarily large, in particular, for ''L'' larger than the length of the procedure ''P'', (which is finite). Proof: We can find an effective enumeration of all the formal proofs in S by some procedure function NthProof(int ''n'') which takes as input ''n'' and outputs some proof. This function enumerates all proofs. Some of these are proofs for formulas we do not care about here, since every possible proof in the language of S is produced for some ''n''. Some of these are complexity formulas of the form ''K''(''s'') ≥ ''n'' where ''s'' and ''n'' are constants in the language of S. There is a procedure function NthProofProvesComplexityFormula(int ''n'') which determines whether the ''n''th proof actually proves a complexity formula ''K''(''s'') ≥ ''L''. The strings ''s'', and the integer ''L'' in turn, are computable by procedure: function StringNthProof(int ''n'') function ComplexityLowerBoundNthProof(int ''n'') Consider the following procedure: function GenerateProvablyComplexString(int ''n'') for i = 1 to infinity: if NthProofProvesComplexityFormula(i) and ComplexityLowerBoundNthProof(i) ≥ ''n'' return StringNthProof(''i'') Given an ''n'', this procedure tries every proof until it finds a string and a proof in the
formal system A formal system is an abstract structure used for inferring theorems from axioms according to a set of rules. These rules, which are used for carrying out the inference of theorems from axioms, are the logical calculus of the formal system. A fo ...
S of the formula ''K''(''s'') ≥ ''L'' for some ''L'' ≥ ''n''; if no such proof exists, it loops forever. Finally, consider the program consisting of all these procedure definitions, and a main call: GenerateProvablyComplexString(''n''0) where the constant ''n''0 will be determined later on. The overall program length can be expressed as ''U''+log2(''n''0), where ''U'' is some constant and log2(''n''0) represents the length of the integer value ''n''0, under the reasonable assumption that it is encoded in binary digits. We will choose ''n''0 to be greater than the program length, that is, such that ''n''0 > ''U''+log2(''n''0). This is clearly true for ''n''0 sufficiently large, because the left hand side grows linearly in ''n''0 whilst the right hand side grows logarithmically in ''n''0 up to the fixed constant ''U''. Then no proof of the form "''K''(''s'')≥''L''" with ''L''≥''n''0 can be obtained in S, as can be seen by an indirect argument: If ComplexityLowerBoundNthProof(i) could return a value ≥''n''0, then the loop inside GenerateProvablyComplexString would eventually terminate, and that procedure would return a string ''s'' such that This is a contradiction,
Q.E.D. Q.E.D. or QED is an initialism of the Latin phrase , meaning "which was to be demonstrated". Literally it states "what was to be shown". Traditionally, the abbreviation is placed at the end of mathematical proofs and philosophical arguments in pri ...
As a consequence, the above program, with the chosen value of ''n''0, must loop forever. Similar ideas are used to prove the properties of
Chaitin's constant In the computer science subfield of algorithmic information theory, a Chaitin constant (Chaitin omega number) or halting probability is a real number that, informally speaking, represents the probability that a randomly constructed program will ...
.


Minimum message length

The minimum message length principle of statistical and inductive inference and machine learning was developed by C.S. Wallace and D.M. Boulton in 1968. MML is
Bayesian Thomas Bayes (/beɪz/; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian minister. Bayesian () refers either to a range of concepts and approaches that relate to statistical methods based on Bayes' theorem, or a followe ...
(i.e. it incorporates prior beliefs) and information-theoretic. It has the desirable properties of statistical invariance (i.e. the inference transforms with a re-parametrisation, such as from polar coordinates to Cartesian coordinates), statistical consistency (i.e. even for very hard problems, MML will converge to any underlying model) and efficiency (i.e. the MML model will converge to any true underlying model about as quickly as is possible). C.S. Wallace and D.L. Dowe (1999) showed a formal connection between MML and algorithmic information theory (or Kolmogorov complexity).


Kolmogorov randomness

''Kolmogorov randomness'' defines a string (usually of bits) as being
random In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no order and does not follow an intelligible pattern or combination. Individual ran ...
if and only if every
computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. Computer programs are one component of software, which also includes software documentation, documentation and oth ...
that can produce that string is at least as long as the string itself. To make this precise, a universal computer (or
universal Turing machine In computer science, a universal Turing machine (UTM) is a Turing machine that can simulate an arbitrary Turing machine on arbitrary input. The universal machine essentially achieves this by reading both the description of the machine to be simu ...
) must be specified, so that "program" means a program for this universal machine. A random string in this sense is "incompressible" in that it is impossible to "compress" the string into a program that is shorter than the string itself. For every universal computer, there is at least one algorithmically random string of each length. Whether a particular string is random, however, depends on the specific universal computer that is chosen. This is because a universal computer can have a particular string hard-coded in itself, and a program running on this universal computer can then simply refer to this hard-coded string using a short sequence of bits (i.e. much shorter than the string itself). This definition can be extended to define a notion of randomness for ''infinite'' sequences from a finite alphabet. These algorithmically random sequences can be defined in three equivalent ways. One way uses an effective analogue of
measure theory In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simila ...
; another uses effective martingales. The third way defines an infinite sequence to be random if the prefix-free Kolmogorov complexity of its initial segments grows quickly enough — there must be a constant ''c'' such that the complexity of an initial segment of length ''n'' is always at least ''n''−''c''. This definition, unlike the definition of randomness for a finite string, is not affected by which universal machine is used to define prefix-free Kolmogorov complexity.


Relation to entropy

For dynamical systems, entropy rate and algorithmic complexity of the trajectories are related by a theorem of Brudno, that the equality K(x;T) = h(T) holds for almost all x. It can be shown that for the output of Markov information sources, Kolmogorov complexity is related to the
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...
of the information source. More precisely, the Kolmogorov complexity of the output of a Markov information source, normalized by the length of the output, converges almost surely (as the length of the output goes to infinity) to the
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodyna ...
of the source.


Conditional versions

The conditional Kolmogorov complexity of two strings K(x, y) is, roughly speaking, defined as the Kolmogorov complexity of ''x'' given ''y'' as an auxiliary input to the procedure. There is also a length-conditional complexity K(x, L(x)), which is the complexity of ''x'' given the length of ''x'' as known/input.


See also

* Important publications in algorithmic information theory *
Berry paradox The Berry paradox is a self-referential paradox arising from an expression like "The smallest positive integer not definable in under sixty letters" (a phrase with fifty-seven letters). Bertrand Russell, the first to discuss the paradox in print, ...
*
Code golf Code golf is a type of recreational computer programming competition in which participants strive to achieve the shortest possible source code that solves a certain problem.Code Golf Stack ExchangeAbout code-golf Retrieved 2021-12-21. Code golf ch ...
*
Data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressio ...
* Descriptive complexity theory * Grammar induction *
Inductive inference Inductive reasoning is a method of reasoning in which a general principle is derived from a body of observations. It consists of making broad generalizations based on specific observations. Inductive reasoning is distinct from ''deductive'' rea ...
*
Kolmogorov structure function In 1973, Andrey Kolmogorov proposed a non-probabilistic approach to statistics and model selection. Let each datum be a finite binary string and a model be a finite set of binary strings. Consider model classes consisting of models of given maximal ...
* Levenshtein distance *
Solomonoff's theory of inductive inference Solomonoff's theory of inductive inference is a mathematical proof that if a universe is generated by an algorithm, then observations of that universe, encoded as a dataset, are best predicted by the smallest executable archive of that dataset. T ...
* Sample entropy


Notes


References


Further reading

* * * * * * *


External links


The Legacy of Andrei Nikolaevich Kolmogorov

Chaitin's online publications




by J. Schmidhuber * * Tromp's lambda calculus computer model offers a concrete definition of K()] * Universal AI based on Kolmogorov Complexity by Marcus Hutter, M. Hutter:
David Dowe


an

pages. * {{DEFAULTSORT:Kolmogorov Complexity * * Computability theory Descriptive complexity Measures of complexity Computational complexity theory