In 1973,
Andrey Kolmogorov
Andrey Nikolaevich Kolmogorov ( rus, Андре́й Никола́евич Колмого́ров, p=ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf, a=Ru-Andrey Nikolaevich Kolmogorov.ogg, 25 April 1903 – 20 October 1987) was a Sovi ...
proposed a non-probabilistic approach to statistics and model selection. Let each datum be a finite binary string and a model be a finite set of binary strings. Consider model classes consisting of models of given maximal
Kolmogorov complexity.
The Kolmogorov structure function of an individual data string expresses the relation between the complexity level constraint on a model class and the least log-cardinality of a model in the class containing the data. The structure function determines all
stochastic
Stochastic (, ) refers to the property of being well described by a random probability distribution. Although stochasticity and randomness are distinct in that the former refers to a modeling approach and the latter refers to phenomena themselv ...
properties of the individual data string: for every constrained model class it determines the individual best-fitting model in the class irrespective of whether the true model is in the model class considered or not. In the classical case we talk about a set of data with a probability distribution, and the properties are those of the expectations. In contrast, here we deal with individual data strings and the properties of the individual string focused on. In this setting, a property holds with certainty rather than with high probability as in the classical case. The Kolmogorov structure function precisely quantifies the goodness-of-fit of an individual model with respect to individual data.
The Kolmogorov structure function is used in the
algorithmic information theory, also known as the theory of Kolmogorov complexity, for describing the structure of a
string
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
by use of
models of increasing complexity.
Kolmogorov's definition
The structure function was originally proposed by
Kolmogorov in 1973 at a Soviet Information Theory symposium in Tallinn, but these results were not published
p. 182. But the results were announced in in 1974, the only written record by Kolmogorov himself. One of his last scientific statements is (translated from the original Russian by L.A. Levin):
Contemporary definition
It is discussed in Cover and Thomas.
It is extensively studied in Vereshchagin and
Vitányi where also the main properties are resolved.
The Kolmogorov structure function can be written as
:
where
is a binary string of length
with
where
is a contemplated model (set of n-length strings) for
,
is the
Kolmogorov complexity of
and
is a nonnegative integer value bounding the complexity of the contemplated
's. Clearly, this function is nonincreasing and reaches
for
where
is the required number of bits to change
into
and
is the
Kolmogorov complexity of
.
The algorithmic sufficient statistic
We define a set
containing
such that
:
.
The function
never decreases more than a fixed independent constant below the diagonal called sufficiency line L defined by
:
.
It is approached to within a constant distance by the graph of
for certain arguments (for instance, for
). For these
's we have
and the associated model
(witness for
) is called an optimal set for
, and its description of
bits is therefore an ''algorithmic sufficient statistic''. We write `algorithmic' for `Kolmogorov complexity' by convention. The main properties of an algorithmic
sufficient statistic
In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the pa ...
are the following: If
is an algorithmic sufficient statistic for
, then
:
.
That is, the two-part description of
using the model
and as data-to-model code the index of
in the enumeration of
in
bits, is as concise as the shortest one-part code of
in
bits. This can be easily seen as follows:
:
,

using straightforward inequalities and the sufficiency property, we find that
. (For example, given
, we can describe
self-delimitingly (you can determine its end) in
bits.) Therefore, the ''randomness deficiency''
of
in
is a constant, which means that
is a typical (random) element of S. However, there can be models
containing
that are not sufficient statistics. An algorithmic sufficient statistic
for
has the additional property, apart from being a model of best fit, that
and therefore by the Kolmogorov complexity ''symmetry of information'' (the information about
in
is about the same as the information about
in x) we have
: the algorithmic sufficient statistic
is a model of best fit that is almost completely determined by
. (
is a shortest program for
.) The algorithmic sufficient statistic associated with the least such
is called the algorithmic
minimal sufficient statistic.
With respect to the picture: The MDL structure function
is explained below. The Goodness-of-fit structure function
is the least randomness deficiency (see above) of any model
for
such that
. This structure function gives the goodness-of-fit of a model
(containing x) for the string x. When it is low the model fits well, and when it is high the model doesn't fit well. If
for some
then there is a typical model
for
such that
and
is typical (random) for S. That is,
is the best-fitting model for x. For more details see
and especially
and.
Selection of properties
Within the constraints that the graph goes down at an angle of at least 45 degrees, that it starts at n and ends approximately at
, every graph (up to a
additive term in argument and value) is realized by the structure function of some data x and vice versa. Where the graph hits the diagonal first the argument (complexity) is that of the minimum sufficient statistic. It is incomputable to determine this place. See.
Main property
It is proved that at each level
of complexity the structure function allows us to select the best model
for the individual string x within a strip of
with certainty, not with great probability.
The MDL variant
The
Minimum description length (MDL) function: The length of the minimal two-part code for x consisting of the model cost K(S) and the
length of the index of x in S, in the model class of sets of given maximal Kolmogorov complexity
, the complexity of S upper bounded by
, is given by the MDL function or constrained MDL estimator:
:
where
is the total length of two-part code of x with help of model S.
Main property
It is proved that at each level
of complexity the structure function allows us to select the best model S for the individual string x within a strip of
with certainty, not with great probability.
Application in statistics
The mathematics developed above were taken as the foundation of
MDL MDL may refer to:
Computing
* Fayyad & Irani's MDL method, a discretization method
* MDL (programming language), derived from LISP
* The extension for Valve's Source, Source 2 and ID Software's IDTech game engines proprietary model file format
* ...
by its inventor
Jorma Rissanen.
Probability models
For every computable probability distribution
it can be proved that
:
.
For example, if
is some computable distribution on the set
of strings of length
, then each
has probability
. Kolmogorov's structure function becomes
:
where x is a binary string of length n with
where
is a contemplated model (computable probability of
-length strings) for
,
is the
Kolmogorov complexity of
and
is an integer value bounding the complexity of the contemplated
's. Clearly, this function is non-increasing and reaches
for
where c is the required number of bits to change
into
and
is the
Kolmogorov complexity of
. Then
. For every complexity level
the function
is the Kolmogorov complexity version of the
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
(ML).
Main property
It is proved that at each level
of complexity the structure function allows us to select the best model
for the individual string
within a strip of
with certainty, not with great probability.
The MDL variant and probability models
The MDL function: The length of the minimal two-part code for x consisting of the model cost K(P) and the
length of
, in the model class of computable probability mass functions of given maximal Kolmogorov complexity
, the complexity of P upper bounded by
, is given by the MDL function or constrained MDL estimator:
:
where
is the total length of two-part code of x with help of model P.
Main property
It is proved that at each level
of complexity the MDL function allows us to select the best model P for the individual string x within a strip of
with certainty, not with great probability.
Extension to rate distortion and denoising
It turns out that the approach can be extended to a theory of
rate distortion of individual finite sequences
and
denoising
Noise reduction is the process of removing noise from a signal. Noise reduction techniques exist for audio and images. Noise reduction algorithms may distort the signal to some degree. Noise rejection is the ability of a circuit to isolate an un ...
of individual finite sequences using Kolmogorov complexity. Experiments using real compressor programs have been carried out with success.
Here the assumption is that for natural data the Kolmogorov complexity is not far from the length of a compressed version using a good compressor.
References
Literature
*
*
* , Especially pp. 401–431 about the Kolmogorov structure function, and pp. 613–629 about rate distortion and denoising of individual sequences.
*
*
*
{{DEFAULTSORT:Kolmogorov Structure Function
Algorithmic information theory