real analysis In mathematics, the branch of real analysis studies the behavior of real numbers, sequences and series of real numbers, and real functions. Some particular properties of real-valued sequences and functions that real analysis studies include co ...

and

approximation theory In mathematics, approximation theory is concerned with how function (mathematics), functions can best be approximation, approximated with simpler functions, and with quantitative property, quantitatively characterization (mathematics), characteri ...

, the Kolmogorov–Arnold representation theorem (or superposition theorem) states that every multivariate

continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous ...

function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-orie ...

n\to \R

can be represented as a

superposition In mathematics, a linear combination or superposition is an expression constructed from a set of terms by multiplying each term by a constant and adding the results (e.g. a linear combination of ''x'' and ''y'' would be any expression of the form ...

of continuous single-variable functions. The works of

Vladimir Arnold Vladimir Igorevich Arnold (or Arnol'd; , ; 12 June 1937 – 3 June 2010) was a Soviet and Russian mathematician. He is best known for the Kolmogorov–Arnold–Moser theorem regarding the stability of integrable systems, and contributed to s ...

and

Andrey Kolmogorov Andrey Nikolaevich Kolmogorov ( rus, Андре́й Никола́евич Колмого́ров, p=ɐnˈdrʲej nʲɪkɐˈlajɪvʲɪtɕ kəlmɐˈɡorəf, a=Ru-Andrey Nikolaevich Kolmogorov.ogg, 25 April 1903 – 20 October 1987) was a Soviet ...

established that if ''f'' is a multivariate continuous function, then ''f'' can be written as a finite

composition Composition or Compositions may refer to: Arts and literature *Composition (dance), practice and teaching of choreography * Composition (language), in literature and rhetoric, producing a work in spoken tradition and written discourse, to include ...

of continuous functions of a single variable and the

binary operation In mathematics, a binary operation or dyadic operation is a rule for combining two elements (called operands) to produce another element. More formally, a binary operation is an operation of arity two. More specifically, a binary operation ...

addition Addition (usually signified by the Plus and minus signs#Plus sign, plus symbol, +) is one of the four basic Operation (mathematics), operations of arithmetic, the other three being subtraction, multiplication, and Division (mathematics), divis ...

. More specifically, : where

to \R

and

\Phi_\colon \R \to \R

. There are proofs with specific constructions. It solved a more constrained form of Hilbert's thirteenth problem, so the original Hilbert's thirteenth problem is a corollary. In a sense, they showed that the only true continuous multivariate function is the sum, since every other continuous function can be written using

univariate In mathematics, a univariate object is an expression (mathematics), expression, equation, function (mathematics), function or polynomial involving only one Variable (mathematics), variable. Objects involving more than one variable are ''wikt:multi ...

continuous functions and summing.

History

The Kolmogorov–Arnold representation theorem is closely related to

Hilbert's 13th problem Hilbert's thirteenth problem is one of the 23 Hilbert problems set out in a celebrated list compiled in 1900 by David Hilbert. It entails proving whether a solution exists for all 7th-degree equations using algebraic (variant: continuous) fun ...

. In his

Paris Paris () is the Capital city, capital and List of communes in France with over 20,000 inhabitants, largest city of France. With an estimated population of 2,048,472 residents in January 2025 in an area of more than , Paris is the List of ci ...

lecture at the

International Congress of Mathematicians The International Congress of Mathematicians (ICM) is the largest conference for the topic of mathematics. It meets once every four years, hosted by the International Mathematical Union (IMU). The Fields Medals, the IMU Abacus Medal (known before ...

in 1900,

David Hilbert David Hilbert (; ; 23 January 1862 – 14 February 1943) was a German mathematician and philosopher of mathematics and one of the most influential mathematicians of his time. Hilbert discovered and developed a broad range of fundamental idea ...

formulated 23 problems which in his opinion were important for the further development of mathematics. The 13th of these problems dealt with the solution of general equations of higher degrees. It is known that for algebraic equations of degree 4 the solution can be computed by formulae that only contain radicals and arithmetic operations. For higher orders,

Galois theory In mathematics, Galois theory, originally introduced by Évariste Galois, provides a connection between field (mathematics), field theory and group theory. This connection, the fundamental theorem of Galois theory, allows reducing certain problems ...

shows us that the solutions of algebraic equations cannot be expressed in terms of basic algebraic operations. It follows from the so called

Tschirnhaus transformation In mathematics, a Tschirnhaus transformation, also known as Tschirnhausen transformation, is a type of mapping on polynomials developed by Ehrenfried Walther von Tschirnhaus in 1683. Simply, it is a method for transforming a polynomial equation ...

that the general algebraic equation :

x^+a_x^+\cdots +a_=0

can be translated to the form

y^+b_y^+\cdots +b_y+1=0

. The Tschirnhaus transformation is given by a formula containing only radicals and arithmetic operations and transforms. Therefore, the solution of an algebraic equation of degree

n

can be represented as a superposition of functions of two variables if

n<7

and as a superposition of functions of

n-4

variables if

n\geq 7

. For

n=7

the solution is a superposition of arithmetic operations, radicals, and the solution of the equation A further simplification with algebraic transformations seems to be impossible which led to Hilbert's conjecture that "A solution of the general equation of degree 7 cannot be represented as a superposition of continuous functions of two variables". This explains the relation of Hilbert's thirteenth problem to the representation of a higher-dimensional function as superposition of lower-dimensional functions. In this context, it has stimulated many studies in the theory of functions and other related problems by different authors.

Variants

A variant of Kolmogorov's theorem that reduces the number of outer functions

\Phi_

is due to George Lorentz. He showed in 1962 that the outer functions

\Phi_

can be replaced by a single function

\Phi

. More precisely, Lorentz proved the existence of functions

\phi _

q=0,1,\ldots, 2n

p=1,\ldots,n,

such that :

f(\mathbf x) = \sum_^ \Phi\!\left(\sum_^ \phi_(x_)\right).

David Sprecher replaced the inner functions

\phi_

by one single inner function with an appropriate shift in its argument. He proved that there exist real values

\eta, \lambda_1,\ldots,\lambda_n

, a continuous function

\Phi\colon \mathbb \rightarrow \R

, and a real increasing continuous function

\phi\colon,1 \rightarrow,1 /math> with \phi \in \operatorname(\ln 2/\ln (2N+2)), for N \geq n \geq 2, such that

: f(\mathbf x) = \sum_^ \Phi\!\left(\sum_^ \lambda_p \phi(x_+\eta q)+q \right). Phillip A. Ostrand  generalized the Kolmogorov superposition theorem to compact metric spaces. For p=1,\ldots,m let X_p be compact metric spaces of finite dimension n_p and let n = \sum_^ n_p . Then there exists continuous functions \phi_\colon X_p \rightarrow,1 q=0,\ldots,2n, p=1,\ldots,m and continuous functions G_q\colon,1 \rightarrow \R, q=0,\ldots,2n such that any continuous function f\colon X_1 \times \dots \times X_m \rightarrow \mathbb is representable in the form 

: f(x_1,\ldots,x_m) = \sum_^ G_\!\left(\sum_^ \phi_(x_)\right). Kolmogorov–Arnold representation theorem and its aforementioned variants also hold for discontinuous multivariate functions.

Continuous form

In its classic form Kolmogorov–Arnold representation has two layers, where the first, called inner layer, is vector to vector mapping :

s_q = \sum_^ \phi_(x_), \quad q=0,1,..,2n

and the second, outer layer, is vector to scalar mapping :

f(x_1, ... , x_m) = \sum_^ \Phi_q\left(s_q\right).

The transition from discrete to continuous form for inner layer gives equation of Urysohn with 3D kernel :

s(q) = \int_^ F (p),p,q p, \quad q \in_1, q_2

same transition for the outer layer gives its particular case :

f = \int_^ G (q),q q.

The generalization of Kolmogorov-Arnold representation known as Kolmogorov-Arnold network in continuous form is a chain of Urysohn equations, where outer equation also may return function or a vector as multiple related targets. Urysohn equation was introduced in 1924 for a different purpose, as function to function mapping with the problem of finding function

x(p)

, provided

s(q)

and

F (p),p,q /math>.

Limitations

The theorem does not hold in general for complex multi-variate functions, as discussed here. Furthermore, the non-smoothness of the inner functions and their "wild behavior" has limited the practical use of the representation, although there is some debate on this.

Applications

In the field of

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

, there have been various attempts to use

neural networks A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...

modeled on the Kolmogorov–Arnold representation. In these works, the Kolmogorov–Arnold theorem plays a role analogous to that of the

universal approximation theorem In the mathematical theory of artificial neural networks, universal approximation theorems are theorems of the following form: Given a family of neural networks, for each function f from a certain function space, there exists a sequence of neural ...

in the study of

multilayer perceptron In deep learning, a multilayer perceptron (MLP) is a name for a modern feedforward neural network consisting of fully connected neurons with nonlinear activation functions, organized in layers, notable for being able to distinguish data that is ...

Proof

Here one example is proved. This proof closely follows. A proof for the case of functions depending on two variables is given, as the generalization is immediate.

Setup

* Let

I

be the unit interval

, 1 The comma is a punctuation mark that appears in several variants in different languages. Some typefaces render it as a small line, slightly curved or straight, but inclined from the vertical; others give it the appearance of a miniature fille ...

/math>. * Let

C /math> be the set of continuous functions of type

\to \R. It is a

function space In mathematics, a function space is a set of functions between two fixed sets. Often, the domain and/or codomain will have additional structure which is inherited by the function space. For example, the set of functions from any set into a ve ...

with

supremum norm In mathematical analysis, the uniform norm (or ) assigns, to real- or complex-valued bounded functions defined on a set , the non-negative number :\, f\, _\infty = \, f\, _ = \sup\left\. This norm is also called the , the , the , or, when t ...

(it is a

Banach space In mathematics, more specifically in functional analysis, a Banach space (, ) is a complete normed vector space. Thus, a Banach space is a vector space with a metric that allows the computation of vector length and distance between vectors and ...

). * Let

f

be a continuous function of type

2 \to \R, and let

\, f\,

be the supremum of it on

2. * Let

t

be a positive irrational number. Its exact value is irrelevant. We say that a 5-tuple

(\phi_1, \dots, \phi_5) \in C 5

is a Kolmogorov–Arnold tuple

if and only if In logic and related fields such as mathematics and philosophy, "if and only if" (often shortened as "iff") is paraphrased by the biconditional, a logical connective between statements. The biconditional is true in two cases, where either bo ...

for any

f \in C^2 /math> there exists a continuous function g: \R \to \R, such that f(x,y) = \sum_^5 g(\phi_i(x) + t \phi_i(y)) In the notation, we have the following:

Proof

Fix a

f \in C^2 /math>. We show that a certain subset U_f \subset C 5 is open and dense: There exists continuous g such that \, g\,  < \frac \, f\,, and \Big\,  f(x,y) - \sum_^5 g(\phi_i(x) + t \phi_i(y)) \Big\,  < \frac  \, f\, We can assume that \, f\,  = 1 with no loss of generality.

By continuity, the set of such 5-tuples is open in C 5 . It remains to prove that they are dense.

The key idea is to divide

2 into an overlapping system of small squares, each with a unique address, and define

g

to have the appropriate value at each address.

Grid system

Let

\psi_1 \in C /math>. For any \epsilon > 0, for all large N, we can discretize \psi_1 into a continuous function \phi_1 satisfying the following properties:

* \phi_1 is constant on each of the intervals /5N, 4/5N /5N, 9/5N \dots, -5/5N, 1-1/5N /math>.

* These values are different rational numbers.
* \, \psi_1 - \phi_1\,  < \epsilon .

This function \phi_1 creates a grid address system on

2, divided into streets and blocks. The blocks are of form

\dots

Since

f

is continuous on

2, it is uniformly continuous. Thus, we can take

N

large enough, so that

f

varies by less than

1/7

on any block. On each block,

\phi_1(x) + t \phi_1(y)

has a constant value. The key property is that, because

t

is irrational, and

\phi_1

is rational on the blocks, each block has a different value of

\phi_1(x) + t \phi_1(y)

. So, given any 5-tuple

(\psi_1, \dots, \psi_5)

, we construct such a 5-tuple

(\phi_1, \dots, \phi_5)

. These create 5 overlapping grid systems. Enumerate the blocks as

R_

, where

R_

is the

r

-th block of the grid system created by

\phi_i

. The address of this block is

a_ := \phi_i(x) + t \phi_i(y)

, for any

(x, y) \in R_

. By adding a small and linearly independent irrational number (the construction is similar to that of the

Hamel basis In mathematics, a set of elements of a vector space is called a basis (: bases) if every element of can be written in a unique way as a finite linear combination of elements of . The coefficients of this linear combination are referred to as ...

) to each of

(\phi_1, \dots, \phi_5)

, we can ensure that every block has a unique address. By plotting out the entire grid system, one can see that every point in

2 is contained in 3 to 5 blocks, and 2 to 0 streets.

Construction of ''g''

For each block

R_

, if

f > 0

on all of

R_

then define

g(a_) = +1/7

; if

f < 0

on all of

R_

then define

g(a_) = -1/7

. Now, linearly interpolate

g

between these defined values. It remains to show this construction has the desired properties. For any

(x, y) \in I^2

, we consider three cases. If

f(x, y) \in /7, 7/7 /math>, then by uniform continuity, f > 0 on every block R_that contains the point (x, y) . This means that g = 1/7 on 3 to 5 of the blocks, and have an unknown value on 2 to 0 of the streets. Thus, we have \sum_^5 g(\phi_i(x) + t \phi_i(y)) \in /7, 5/7 giving \Big,  f(x,y) - \sum_^5 g(\phi_i(x) + t \phi_i(y)) \Big,  \in, 6/7 Similarly for f(x, y) \in 7/7, -1/7 /math>.

If f(x, y) \in 1/7, 1/7 /math>, then since \, g\,  \leq 1/7, we still have \Big,  f(x,y) - \sum_^5 g(\phi_i(x) + t \phi_i(y)) \Big,  \in, 6/7

Baire category theorem

Iterating the above construction, then applying the

Baire category theorem The Baire category theorem (BCT) is an important result in general topology and functional analysis. The theorem has two forms, each of which gives sufficient conditions for a topological space to be a Baire space (a topological space such that th ...

, we find that the following kind of 5-tuples are open and dense in

C 5

: There exists a sequence of

g_1, g_2, \dots

such that

\, g_1\,  < \frac \, f\,

\, g_2\,  < \frac7 \frac7 \, f\,

, etc. This allows their sum to be defined:

g := \sum_n g_n

, which is still continuous and bounded, and it satisfies

f(x,y) = \sum_^5 g(\phi_i(x) + t \phi_i(y))

Since

C^2 /math> has a countable dense subset, we can apply the Baire category theorem again to obtain the full theorem.

Extensions

The above proof generalizes for

n

-dimensions: Divide the cube

n into

(2n+1)

interlocking grid systems, such that each point in the cube is on

(n+1)

(2n + 1)

blocks, and

0

n

streets. Now, since

(n+1) > n

, the above construction works. Indeed, this is the best possible value. A relatively short proof is given in via

dimension theory In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coord ...

. In another direction of generality, more conditions can be imposed on the Kolmogorov–Arnold tuples. The proof is given in. (Vituškin, 1954) showed that the theorem is false if we require all functions

f, g,\phi_i

to be continuously differentiable. The theorem remains true if we require all

\phi_i

to be 1-

Lipschitz continuous In mathematical analysis, Lipschitz continuity, named after Germany, German mathematician Rudolf Lipschitz, is a strong form of uniform continuity for function (mathematics), functions. Intuitively, a Lipschitz continuous function is limited in h ...

References

Sources

, "On the representation of continuous functions of several variables by superpositions of continuous functions of a smaller number of variables", ''

Proceedings of the USSR Academy of Sciences The ''Proceedings of the USSR Academy of Sciences'' (, ''Doklady Akademii Nauk SSSR'' (''DAN SSSR''), ) was a Soviet journal that was dedicated to publishing original, academic research papers in physics, mathematics, chemistry, geology, and biol ...

'', 108 (1956), pp. 179–182; English translation
''Amer. Math. Soc. Transl.'', "17: Twelve Papers on Algebra and Real Functions" (1961)
pp. 369–373. *

, "On functions of three variables", ''

'', 114 (1957), pp. 679–681; English translation
''Amer. Math. Soc. Transl.'', "28: Sixteen Papers on Analysis" (1963)
pp. 51–54
SpringerLink
*Vladimir Arnold, "On the representation of continuous functions of three variables as superpositions of continuous functions of two variables", ''Dokl. Akad. Nauk. SSSR'' 114:4 (1957), pp. 679–681 (in Russian
SpringerLink
*Andrey Kolmogorov, "On the representation of continuous functions of several variables as superpositions of continuous functions of one variable and addition", (1957); English translation
''Amer. Math. Soc. Transl.'', "28: Sixteen Papers on Analysis" (1963)PDF

History

Variants

Continuous form

Limitations

Applications

Proof

Setup

Proof

Grid system

Construction of ''g''

Baire category theorem

Extensions

References

Sources

Further reading