In
algorithmic information theory (a subfield of
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
and
mathematics
Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest
computer program
A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...
(in a predetermined
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
) that produces the object as output. It is a measure of the
computation
A computation is any type of arithmetic or non-arithmetic calculation that is well-defined. Common examples of computation are mathematical equation solving and the execution of computer algorithms.
Mechanical or electronic devices (or, hist ...
al resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after
Andrey Kolmogorov
Andrey Nikolaevich Kolmogorov ( rus, Андре́й Никола́евич Колмого́ров, p=ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf, a=Ru-Andrey Nikolaevich Kolmogorov.ogg, 25 April 1903 – 20 October 1987) was a Soviet ...
, who first published on the subject in 1963 and is a generalization of classical information theory.
The notion of Kolmogorov complexity can be used to state and
prove impossibility results akin to
Cantor's diagonal argument
Cantor's diagonal argument (among various similar namesthe diagonalisation argument, the diagonal slash argument, the anti-diagonal argument, the diagonal method, and Cantor's diagonalization proof) is a mathematical proof that there are infin ...
,
Gödel's incompleteness theorem, and
Turing's halting problem.
In particular, no program ''P'' computing a
lower bound
In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is every element of .
Dually, a lower bound or minorant of is defined to be an element of that is less th ...
for each text's Kolmogorov complexity can return a value essentially larger than ''P''
's own length (see section ); hence no single program can compute the exact Kolmogorov complexity for infinitely many texts.
Definition
Intuition
Consider the following two
strings of 32 lowercase letters and digits:
:
abababababababababababababababab
, and
:
4c1j5b2p0cv4w1x8rx2y39umgw5q85s7
The first string has a short English-language description, namely "write ab 16 times", which consists of 17 characters. The second one has no obvious simple description (using the same character set) other than writing down the string itself, i.e., "write 4c1j5b2p0cv4w1x8rx2y39umgw5q85s7" which has 38 characters. Hence the operation of writing the first string can be said to have "less complexity" than writing the second.
More formally, the
complexity
Complexity characterizes the behavior of a system or model whose components interact in multiple ways and follow local rules, leading to non-linearity, randomness, collective dynamics, hierarchy, and emergence.
The term is generally used to c ...
of a string is the length of the shortest possible description of the string in some fixed
universal description language (the sensitivity of complexity relative to the choice of description language is discussed below). It can be shown that the Kolmogorov complexity of any string cannot be more than a few bytes larger than the length of the string itself. Strings like the ''abab'' example above, whose Kolmogorov complexity is small relative to the string's size, are not considered to be complex.
The Kolmogorov complexity can be defined for any mathematical object, but for simplicity the scope of this article is restricted to strings. We must first specify a description language for strings. Such a description language can be based on any computer programming language, such as
Lisp
Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation.
Originally specified in the late 1950s, ...
,
Pascal, or
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
. If P is a program which outputs a string ''x'', then P is a description of ''x''. The length of the description is just the length of P as a character string, multiplied by the number of bits in a character (e.g., 7 for
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
).
We could, alternatively, choose an encoding for
Turing machine
A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algori ...
s, where an ''encoding'' is a function which associates to each Turing Machine M a bitstring
. If M is a Turing Machine which, on input ''w'', outputs string ''x'', then the concatenated string ''w'' is a description of ''x''. For theoretical analysis, this approach is more suited for constructing detailed formal proofs and is generally preferred in the research literature. In this article, an informal approach is discussed.
Any string ''s'' has at least one description. For example, the second string above is output by the pseudo-code:
function GenerateString2()
return "4c1j5b2p0cv4w1x8rx2y39umgw5q85s7"
whereas the first string is output by the (much shorter) pseudo-code:
function GenerateString1()
return "ab" × 16
If a description ''d''(''s'') of a string ''s'' is of minimal length (i.e., using the fewest bits), it is called a minimal description of ''s'', and the length of ''d''(''s'') (i.e. the number of bits in the minimal description) is the Kolmogorov complexity of ''s'', written ''K''(''s''). Symbolically,
:''K''(''s'') = , ''d''(''s''), .
The length of the shortest description will depend on the choice of description language; but the effect of changing languages is bounded (a result called the ''invariance theorem'', see below
Below may refer to:
*Earth
*Ground (disambiguation)
*Soil
*Floor
* Bottom (disambiguation)
*Less than
*Temperatures below freezing
*Hell or underworld
People with the surname
* Ernst von Below (1863–1955), German World War I general
* Fred Belo ...
).
Plain Kolmogorov complexity ''C''
There are two definitions of Kolmogorov complexity: ''plain'' and ''prefix-free''. The plain complexity is the minimal description length of any program, and denoted while the prefix-free complexity is the minimal description length of any program encoded in a prefix-free code
A prefix code is a type of code system distinguished by its possession of the prefix property, which requires that there is no whole code word in the system that is a prefix (initial segment) of any other code word in the system. It is trivially t ...
, and denoted . The plain complexity is more intuitive, but the prefix-free complexity is easier to study.
By default, all equations hold only up to an additive constant. For example, really means that , that is, .
Let be a computable function mapping finite binary strings to binary strings. It is a universal function if, and only if, for any computable , we can encode the function in a "program" , such that . We can think of as a program interpreter, which takes in an initial segment describing the program, followed by data that the program should process.
One problem with plain complexity is that , because intuitively speaking, there is no general way to tell where to divide an output string just by looking at the concatenated string. We can divide it by specifying the length of or , but that would take extra symbols. Indeed, for any there exists such that .
Typically, inequalities with plain complexity have a term like on one side, whereas the same inequalities with prefix-free complexity have only .
The main problem with plain complexity is that there is something extra sneaked into a program. A program not only represents for something with its code, but also represents its own length. In particular, a program may represent a binary number up to , simply by its own length. Stated in another way, it is as if we are using a termination symbol to denote where a word ends, and so we are not using 2 symbols, but 3. To fix this defect, we introduce the prefix-free Kolmogorov complexity.
Prefix-free Kolmogorov complexity ''K''
A prefix-free code is a subset of such that given any two different words in the set, neither is a prefix of the other. The benefit of a prefix-free code is that we can build a machine that reads words from the code forward in one direction, and as soon as it reads the last symbol of the word, it ''knows'' that the word is finished, and does not need to backtrack or a termination symbol.
Define a prefix-free Turing machine to be a Turing machine that comes with a prefix-free code, such that the Turing machine can read any string from the code in one direction, and stop reading as soon as it reads the last symbol. Afterwards, it may compute on a work tape and write to a write tape, but it cannot move its read-head anymore.
This gives us the following formal way to describe ''K''.
* Fix a prefix-free universal Turing machine, with three tapes: a read tape infinite in one direction, a work tape infinite in two directions, and a write tape infinite in one direction.
* The machine can read from the read tape in one direction only (no backtracking), and write to the write tape in one direction only. It can read and write the work tape in both directions.
* The work tape and write tape start with all zeros. The read tape starts with an input prefix code, followed by all zeros.
* Let be the prefix-free code on , used by the universal Turing machine.
Note that some universal Turing machines may not be programmable with prefix codes. We must pick only a prefix-free universal Turing machine.
The prefix-free complexity of a string is the shortest prefix code that makes the machine output :
Invariance theorem
Informal treatment
There are some description languages which are optimal, in the following sense: given any description of an object in a description language, said description may be used in the optimal description language with a constant overhead. The constant depends only on the languages involved, not on the description of the object, nor the object being described.
Here is an example of an optimal description language. A description will have two parts:
* The first part describes another description language.
* The second part is a description of the object in that language.
In more technical terms, the first part of a description is a computer program (specifically: a compiler for the object's language, written in the description language), with the second part being the input to that computer program which produces the object as output.
The invariance theorem follows: Given any description language ''L'', the optimal description language is at least as efficient as ''L'', with some constant overhead.
Proof: Any description ''D'' in ''L'' can be converted into a description in the optimal language by first describing ''L'' as a computer program ''P'' (part 1), and then using the original description ''D'' as input to that program (part 2). The
total length of this new description ''D′'' is (approximately):
:, ''D′'' , = , ''P'', + , ''D'',
The length of ''P'' is a constant that doesn't depend on ''D''. So, there is at most a constant overhead, regardless of the object described. Therefore, the optimal language is universal up to Two Mathematical object, mathematical objects and are called "equal up to an equivalence relation "
* if and are related by , that is,
* if holds, that is,
* if the equivalence classes of and with respect to are equal.
This figure of speech ...
this additive constant.
A more formal treatment
Theorem: If ''K''1 and ''K''2 are the complexity functions relative to Turing complete
Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist. He was highly influential in the development of theoretical comput ...
description languages ''L''1 and ''L''2, then there is a constant ''c'' – which depends only on the languages ''L''1 and ''L''2 chosen – such that
:∀''s''. −''c'' ≤ ''K''1(''s'') − ''K''2(''s'') ≤ ''c''.
Proof: By symmetry, it suffices to prove that there is some constant ''c'' such that for all strings ''s''
:''K''1(''s'') ≤ ''K''2(''s'') + ''c''.
Now, suppose there is a program in the language ''L''1 which acts as an interpreter
Interpreting is translation from a spoken or signed language into another language, usually in real time to facilitate live communication. It is distinguished from the translation of a written text, which can be more deliberative and make use o ...
for ''L''2:
function InterpretLanguage(string ''p'')
where ''p'' is a program in ''L''2. The interpreter is characterized by the following property:
: Running InterpretLanguage
on input ''p'' returns the result of running ''p''.
Thus, if P is a program in ''L''2 which is a minimal description of ''s'', then InterpretLanguage
(P) returns the string ''s''. The length of this description of ''s'' is the sum of
# The length of the program InterpretLanguage
, which we can take to be the constant ''c''.
# The length of P which by definition is ''K''2(''s'').
This proves the desired upper bound.
History and context
Algorithmic information theory is the area of computer science that studies Kolmogorov complexity and other complexity measures on strings (or other data structure
In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
s).
The concept and theory of Kolmogorov Complexity is based on a crucial theorem first discovered by Ray Solomonoff, who published it in 1960, describing it in "A Preliminary Report on a General Theory of Inductive Inference" as part of his invention of algorithmic probability. He gave a more complete description in his 1964 publications, "A Formal Theory of Inductive Inference," Part 1 and Part 2 in ''Information and Control''.
Andrey Kolmogorov later independently published this theorem in ''Problems Inform. Transmission'' in 1965. Gregory Chaitin
Gregory John Chaitin ( ; born 25 June 1947) is an Argentina, Argentine-United States, American mathematician and computer scientist. Beginning in the late 1960s, Chaitin made contributions to algorithmic information theory and metamathematics, ...
also presents this theorem in ''J. ACM'' – Chaitin's paper was submitted October 1966 and revised in December 1968, and cites both Solomonoff's and Kolmogorov's papers.
The theorem says that, among algorithms that decode strings from their descriptions (codes), there exists an optimal one. This algorithm, for all strings, allows codes as short as allowed by any other algorithm up to an additive constant that depends on the algorithms, but not on the strings themselves. Solomonoff used this algorithm and the code lengths it allows to define a "universal probability" of a string on which inductive inference of the subsequent digits of the string can be based. Kolmogorov used this theorem to define several functions of strings, including complexity, randomness, and information.
When Kolmogorov became aware of Solomonoff's work, he acknowledged Solomonoff's priority. For several years, Solomonoff's work was better known in the Soviet Union than in the Western World. The general consensus in the scientific community, however, was to associate this type of complexity with Kolmogorov, who was concerned with randomness of a sequence, while Algorithmic Probability became associated with Solomonoff, who focused on prediction using his invention of the universal prior probability distribution. The broader area encompassing descriptional complexity and probability is often called Kolmogorov complexity. The computer scientist Ming Li considers this an example of the Matthew effect
The Matthew effect, sometimes called the Matthew principle or cumulative advantage, is the tendency of individuals to accrue social or economic success in proportion to their initial level of popularity, friends, and wealth. It is sometimes summar ...
: "...to everyone who has, more will be given..."
There are several other variants of Kolmogorov complexity or algorithmic information. The most widely used one is based on self-delimiting programs, and is mainly due to Leonid Levin
Leonid Anatolievich Levin ( ; ; ; born November 2, 1948) is a Soviet-American mathematician and computer scientist.
He is known for his work in randomness in computing, algorithmic complexity and intractability, average-case complexity, fou ...
(1974).
An axiomatic approach to Kolmogorov complexity based on Blum axioms (Blum 1967) was introduced by Mark Burgin in the paper presented for publication by Andrey Kolmogorov.
In the late 1990s and early 2000s, methods developed to approximate Kolmogorov complexity relied on popular compression algorithms like LZW, which made difficult or impossible to provide any estimation to short strings until a method based on Algorithmic probability was introduced, offering the only alternative to compression-based methods.
Basic results
We write to be , where means some fixed way to code for a tuple of strings x and y.
Inequalities
We omit additive factors of . This section is based on.
Theorem.
Proof. Take any program for the universal Turing machine used to define plain complexity, and convert it to a prefix-free program by first coding the length of the program in binary, then convert the length to prefix-free coding. For example, suppose the program has length 9, then we can convert it as follows:where we double each digit, then add a termination code. The prefix-free universal Turing machine can then read in any program for the other machine as follows: