HOME

TheInfoList



OR:

In
functional analysis Functional analysis is a branch of mathematical analysis, the core of which is formed by the study of vector spaces endowed with some kind of limit-related structure (e.g. inner product, norm, topology, etc.) and the linear functions defi ...
(a branch of
mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
), a reproducing kernel Hilbert space (RKHS) is a
Hilbert space In mathematics, Hilbert spaces (named after David Hilbert) allow generalizing the methods of linear algebra and calculus from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise natural ...
of functions in which point evaluation is a continuous linear
functional Functional may refer to: * Movements in architecture: ** Functionalism (architecture) ** Form follows function * Functional group, combination of atoms within molecules * Medical conditions without currently visible organic basis: ** Functional sy ...
. Roughly speaking, this means that if two functions f and g in the RKHS are close in norm, i.e., \, f-g\, is small, then f and g are also pointwise close, i.e., , f(x)-g(x), is small for all x. The converse does not need to be true. Informally, this can be shown by looking at the
supremum norm In mathematical analysis, the uniform norm (or ) assigns to real- or complex-valued bounded functions defined on a set the non-negative number :\, f\, _\infty = \, f\, _ = \sup\left\. This norm is also called the , the , the , or, when th ...
: the sequence of functions \sin^n (x) converges pointwise, but do not converge uniformly i.e. do not converge with respect to the supremum norm (note that this is not a counterexample because the supremum norm does not arise from any
inner product In mathematics, an inner product space (or, rarely, a Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, often ...
due to not satisfying the parallelogram law). It is not entirely straightforward to construct a Hilbert space of functions which is not an RKHS. Some examples, however, have been found. Note that ''L''2 spaces are not Hilbert spaces of functions (and hence not RKHSs), but rather Hilbert spaces of equivalence classes of functions (for example, the functions f and g defined by f(x)=0 and g(x)=1_ are equivalent in ''L''2). However, there are RKHSs in which the norm is an ''L''2-norm, such as the space of band-limited functions (see the example below). An RKHS is associated with a kernel that reproduces every function in the space in the sense that for every x in the set on which the functions are defined, "evaluation at x" can be performed by taking an inner product with a function determined by the kernel. Such a ''reproducing kernel'' exists if and only if every evaluation functional is continuous. The reproducing kernel was first introduced in the 1907 work of Stanisław Zaremba concerning
boundary value problem In mathematics, in the field of differential equations, a boundary value problem is a differential equation together with a set of additional constraints, called the boundary conditions. A solution to a boundary value problem is a solution to t ...
s for
harmonic A harmonic is a wave with a frequency that is a positive integer multiple of the ''fundamental frequency'', the frequency of the original periodic signal, such as a sinusoidal wave. The original signal is also called the ''1st harmonic'', t ...
and biharmonic functions. James Mercer simultaneously examined functions which satisfy the reproducing property in the theory of integral equations. The idea of the reproducing kernel remained untouched for nearly twenty years until it appeared in the dissertations of
Gábor Szegő Gábor Szegő () (January 20, 1895 – August 7, 1985) was a Hungarian-American mathematician. He was one of the foremost mathematical analysts of his generation and made fundamental contributions to the theory of orthogonal polynomials and ...
,
Stefan Bergman Stefan Bergman (5 May 1895 – 6 June 1977) was a Congress Poland-born American mathematician whose primary work was in complex analysis. His name is also written Bergmann; he dropped the second "n" when he came to the U. S. He is best known for t ...
, and
Salomon Bochner Salomon Bochner (20 August 1899 – 2 May 1982) was an Austrian mathematician, known for work in mathematical analysis, probability theory and differential geometry. Life He was born into a Jewish family in Podgórze (near Kraków), then ...
. The subject was eventually systematically developed in the early 1950s by
Nachman Aronszajn Nachman Aronszajn (26 July 1907 – 5 February 1980) was a Polish American mathematician. Aronszajn's main field of study was mathematical analysis, where he systematically developed the concept of reproducing kernel Hilbert space. He also cont ...
and Stefan Bergman. These spaces have wide applications, including
complex analysis Complex analysis, traditionally known as the theory of functions of a complex variable, is the branch of mathematical analysis that investigates Function (mathematics), functions of complex numbers. It is helpful in many branches of mathemati ...
,
harmonic analysis Harmonic analysis is a branch of mathematics concerned with the representation of functions or signals as the superposition of basic waves, and the study of and generalization of the notions of Fourier series and Fourier transforms (i.e. an ex ...
, and
quantum mechanics Quantum mechanics is a fundamental theory in physics that provides a description of the physical properties of nature at the scale of atoms and subatomic particles. It is the foundation of all quantum physics including quantum chemistry, ...
. Reproducing kernel Hilbert spaces are particularly important in the field of
statistical learning theory Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on dat ...
because of the celebrated
representer theorem For computer science, in statistical learning theory, a representer theorem is any of several related results stating that a minimizer f^ of a regularized empirical risk functional defined over a reproducing kernel Hilbert space can be represe ...
which states that every function in an RKHS that minimises an empirical risk functional can be written as a linear combination of the kernel function evaluated at the training points. This is a practically useful result as it effectively simplifies the empirical risk minimization problem from an infinite dimensional to a finite dimensional optimization problem. For ease of understanding, we provide the framework for real-valued Hilbert spaces. The theory can be easily extended to spaces of complex-valued functions and hence include the many important examples of reproducing kernel Hilbert spaces that are spaces of analytic functions.


Definition

Let X be an arbitrary
set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
and H a
Hilbert space In mathematics, Hilbert spaces (named after David Hilbert) allow generalizing the methods of linear algebra and calculus from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise natural ...
of
real-valued function In mathematics, a real-valued function is a function whose values are real numbers. In other words, it is a function that assigns a real number to each member of its domain. Real-valued functions of a real variable (commonly called ''real f ...
s on X, equipped with pointwise addition and pointwise scalar multiplication. The
evaluation Evaluation is a systematic determination and assessment of a subject's merit, worth and significance, using criteria governed by a set of standards. It can assist an organization, program, design, project or any other intervention or initiative to ...
functional over the Hilbert space of functions H is a linear functional that evaluates each function at a point x, : L_ : f \mapsto f(x) \text \forall f \in H. We say that ''H'' is a reproducing kernel Hilbert space if, for all x in X, L_x is
continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous g ...
at every f in H or, equivalently, if L_x is a
bounded operator In functional analysis and operator theory, a bounded linear operator is a linear transformation L : X \to Y between topological vector spaces (TVSs) X and Y that maps bounded subsets of X to bounded subsets of Y. If X and Y are normed vector ...
on H, i.e. there exists some M_x>0 such that Although M_x<\infty is assumed for all x\in X, it might still be the case that \sup_x M_x = \infty. While property () is the weakest condition that ensures both the existence of an inner product and the evaluation of every function in H at every point in the domain, it does not lend itself to easy application in practice. A more intuitive definition of the RKHS can be obtained by observing that this property guarantees that the evaluation functional can be represented by taking the inner product of f with a function K_x in H. This function is the so-called reproducing kernel for the Hilbert space H from which the RKHS takes its name. More formally, the
Riesz representation theorem :''This article describes a theorem concerning the dual of a Hilbert space. For the theorems relating linear functionals to Measure (mathematics), measures, see Riesz–Markov–Kakutani representation theorem.'' The Riesz representation theorem, ...
implies that for all x in X there exists a unique element K_x of H with the reproducing property, Since K_x is itself a function defined on X with values in the field \mathbb (or \mathbb in the case of complex Hilbert spaces) and as K_x is in H we have that : K_x(y) = L_y(K_x)= \langle K_x,\ K_y \rangle_H, where K_y\in H is the element in H associated to L_y. This allows us to define the reproducing kernel of H as a function K: X \times X \to \mathbb by : K(x,y) = \langle K_x,\ K_y \rangle_H. From this definition it is easy to see that K: X \times X \to \mathbb (or \mathbb in the complex case) is both symmetric (resp. conjugate symmetric) and
positive definite In mathematics, positive definiteness is a property of any object to which a bilinear form or a sesquilinear form may be naturally associated, which is positive-definite. See, in particular: * Positive-definite bilinear form * Positive-definite fu ...
, i.e. : \sum_^n c_i c_j K(x_i, x_j)= \sum_^n c_i \left\langle K_ , \sum_^n c_j K_ \right\rangle_ = \left\langle \sum_^n c_i K_ , \sum_^n c_j K_ \right\rangle_ = \left\, \sum_^nc_iK_\right\, _H^2 \ge 0 for every n \in \mathbb, x_1, \dots, x_n \in X, \text c_1, \dots, c_n \in \mathbb. The Moore–Aronszajn theorem (see below) is a sort of converse to this: if a function K satisfies these conditions then there is a Hilbert space of functions on X for which it is a reproducing kernel.


Example

The space of
bandlimited Bandlimiting is the limiting of a signal's frequency domain representation or spectral density to zero above a certain finite frequency. A band-limited signal is one whose Fourier transform or spectral density has bounded support. A bandli ...
continuous function In mathematics, a continuous function is a function such that a continuous variation (that is a change without jump) of the argument induces a continuous variation of the value of the function. This means that there are no abrupt changes in val ...
s H is a RKHS, as we now show. Formally, fix some
cutoff frequency In physics and electrical engineering, a cutoff frequency, corner frequency, or break frequency is a boundary in a system's frequency response at which energy flowing through the system begins to be reduced ( attenuated or reflected) rather tha ...
0 and define the Hilbert space : H = \ where C(\mathbb) is the set of continuous square integrable functions, and F(\omega) = \int_^\infty f(t) e^ \, dt is the
Fourier transform A Fourier transform (FT) is a mathematical transform that decomposes functions into frequency components, which are represented by the output of the transform as a function of frequency. Most commonly functions of time or space are transformed ...
of f. As the inner product of this Hilbert space, we use : \langle f, g\rangle_ = \int_^\infty f(x) \cdot \overline \, dx. From the
Fourier inversion theorem In mathematics, the Fourier inversion theorem says that for many types of functions it is possible to recover a function from its Fourier transform. Intuitively it may be viewed as the statement that if we know all frequency and phase information ...
, we have : f(x) = \frac \int_^a F(\omega) e^ \, d\omega . It then follows by the
Cauchy–Schwarz inequality The Cauchy–Schwarz inequality (also called Cauchy–Bunyakovsky–Schwarz inequality) is considered one of the most important and widely used inequalities in mathematics. The inequality for sums was published by . The corresponding inequality f ...
and Plancherel's theorem that, for all x, : , f(x), \le \frac \sqrt =\frac\sqrt = \sqrt \, f\, _. This inequality shows that the evaluation functional is bounded, proving that H is indeed a RKHS. The kernel function K_x in this case is given by :K_x(y) = \frac \operatorname\left ( \frac (y-x) \right )=\frac. To see this, we first note that the Fourier transform of K_x(y) defined above is given by :\int_^\infty K_x(y)e^ \, dy = \begin e^ &\text \omega \in a, a \\ 0 &\textrm, \end which is a consequence of the time-shifting property of the Fourier transform. Consequently, using Plancherel's theorem, we have : \langle f, K_x\rangle_ = \int_^\infty f(y) \cdot \overline \, dy = \frac \int_^a F(\omega) \cdot e^ \, d\omega = f(x) . Thus we obtain the reproducing property of the kernel. Note that K_x in this case is the "bandlimited version" of the
Dirac delta function In mathematics, the Dirac delta distribution ( distribution), also known as the unit impulse, is a generalized function or distribution over the real numbers, whose value is zero everywhere except at zero, and whose integral over the enti ...
, and that K_x(y) converges to \delta(y-x) in the weak sense as the cutoff frequency a tends to infinity.


Moore–Aronszajn theorem

We have seen how a reproducing kernel Hilbert space defines a reproducing kernel function that is both symmetric and
positive definite In mathematics, positive definiteness is a property of any object to which a bilinear form or a sesquilinear form may be naturally associated, which is positive-definite. See, in particular: * Positive-definite bilinear form * Positive-definite fu ...
. The Moore–Aronszajn theorem goes in the other direction; it states that every symmetric, positive definite kernel defines a unique reproducing kernel Hilbert space. The theorem first appeared in Aronszajn's ''Theory of Reproducing Kernels'', although he attributes it to E. H. Moore. :Theorem. Suppose ''K'' is a symmetric, positive definite kernel on a set ''X''. Then there is a unique Hilbert space of functions on ''X'' for which ''K'' is a reproducing kernel. Proof. For all ''x'' in ''X'', define ''Kx'' = ''K''(''x'', ⋅ ). Let ''H''0 be the linear span of . Define an inner product on ''H''0 by : \left\langle \sum_^n b_j K_, \sum_^m a_i K_ \right \rangle_ = \sum_^m \sum_^n b_j K(y_j, x_i), which implies K(x,y)=\left\langle K_, K_ \right\rangle_. The symmetry of this inner product follows from the symmetry of ''K'' and the non-degeneracy follows from the fact that ''K'' is positive definite. Let ''H'' be the completion of ''H''0 with respect to this inner product. Then ''H'' consists of functions of the form : f(x) = \sum_^\infty a_i K_ (x) \quad \text \quad \lim_\sup_\left\, \sum_^ a_i K_\right\, _ = 0. Now we can check the reproducing property (): :\langle f, K_x \rangle_H = \sum_^\infty a_i\left \langle K_, K_x \right \rangle_= \sum_^\infty a_i K (x_i, x) = f(x). To prove uniqueness, let ''G'' be another Hilbert space of functions for which ''K'' is a reproducing kernel. For every ''x'' and ''y'' in ''X'', () implies that :\langle K_x, K_y \rangle_H = K(x, y) = \langle K_x, K_y \rangle_G. By linearity, \langle \cdot, \cdot \rangle_H = \langle \cdot, \cdot \rangle_G on the span of \. Then H \subset G because ''G'' is complete and contains ''H''0 and hence contains its completion. Now we need to prove that every element of ''G'' is in ''H''. Let f be an element of ''G''. Since ''H'' is a closed subspace of ''G'', we can write f=f_H + f_ where f_H \in H and f_ \in H^\bot . Now if x \in X then, since ''K'' is a reproducing kernel of ''G'' and ''H'': :f(x) = \langle K_x , f \rangle_G = \langle K_x, f_H \rangle_G + \langle K_x, f_ \rangle_G = \langle K_x , f_H \rangle_G = \langle K_x , f_H \rangle_H = f_H(x), where we have used the fact that K_x belongs to ''H'' so that its inner product with f_ in ''G'' is zero. This shows that f = f_H in ''G'' and concludes the proof.


Integral operators and Mercer's theorem

We may characterize a symmetric positive definite kernel K via the integral operator using Mercer's theorem and obtain an additional view of the RKHS. Let X be a compact space equipped with a strictly positive finite
Borel measure In mathematics, specifically in measure theory, a Borel measure on a topological space is a measure that is defined on all open sets (and thus on all Borel sets). Some authors require additional restrictions on the measure, as described below. ...
\mu and K: X \times X \to \R a continuous, symmetric, and positive definite function. Define the integral operator T_K: L_2(X) \to L_2(X) as : _K f\cdot) =\int_X K(\cdot,t) f(t)\, d\mu(t) where L_2(X) is the space of square integrable functions with respect to \mu . Mercer's theorem states that the spectral decomposition of the integral operator T_K of K yields a series representation of K in terms of the eigenvalues and eigenfunctions of T_K . This then implies that K is a reproducing kernel so that the corresponding RKHS can be defined in terms of these eigenvalues and eigenfunctions. We provide the details below. Under these assumptions T_K is a compact, continuous, self-adjoint, and positive operator. The
spectral theorem In mathematics, particularly linear algebra and functional analysis, a spectral theorem is a result about when a linear operator or matrix can be diagonalized (that is, represented as a diagonal matrix in some basis). This is extremely useful be ...
for self-adjoint operators implies that there is an at most countable decreasing sequence (\sigma_i)_i \geq 0 such that \lim_\sigma_i = 0 and T_K\varphi_i(x) = \sigma_i\varphi_i(x), where the \ form an orthonormal basis of L_2(X). By the positivity of T_K, \sigma_i > 0 for all i. One can also show that T_K maps continuously into the space of continuous functions C(X) and therefore we may choose continuous functions as the eigenvectors, that is, \varphi_i \in C(X) for all i. Then by Mercer's theorem K may be written in terms of the eigenvalues and continuous eigenfunctions as : K(x,y) = \sum_^\infty \sigma_j \, \varphi_j(x) \, \varphi_j(y) for all x, y \in X such that : \lim_\sup_ \left , K(u,v) - \sum_^n \sigma_j \, \varphi_j(u) \, \varphi_j(v) \right , = 0. This above series representation is referred to as a Mercer kernel or Mercer representation of K . Furthermore, it can be shown that the RKHS H of K is given by : H = \left \ where the inner product of H given by : \left\langle f,g \right\rangle_H = \sum_^\infty \frac. This representation of the RKHS has application in probability and statistics, for example to the Karhunen-Loève representation for stochastic processes and kernel PCA.


Feature maps

A feature map is a map \varphi\colon X \rightarrow F , where F is a Hilbert space which we will call the feature space. The first sections presented the connection between bounded/continuous evaluation functions, positive definite functions, and integral operators and in this section we provide another representation of the RKHS in terms of feature maps. We first note that every feature map defines a kernel via Clearly K is symmetric and positive definiteness follows from the properties of inner product in F . Conversely, every positive definite function and corresponding reproducing kernel Hilbert space has infinitely many associated feature maps such that () holds. For example, we can trivially take F = H and \varphi(x) = K_x for all x \in X . Then () is satisfied by the reproducing property. Another classical example of a feature map relates to the previous section regarding integral operators by taking F = \ell^2 and \varphi(x) = (\sqrt \varphi_i(x))_i . This connection between kernels and feature maps provides us with a new way to understand positive definite functions and hence reproducing kernels as inner products in H . Moreover, every feature map can naturally define a RKHS by means of the definition of a positive definite function. Lastly, feature maps allow us to construct function spaces that reveal another perspective on the RKHS. Consider the linear space : H_\varphi = \ . We can define a norm on H_\varphi by : \, f\, _\varphi = \inf \ . It can be shown that H_ is a RKHS with kernel defined by K(x,y) = \langle\varphi(x), \varphi(y)\rangle_F . This representation implies that the elements of the RKHS are inner products of elements in the feature space and can accordingly be seen as hyperplanes. This view of the RKHS is related to the
kernel trick In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for exampl ...
in machine learning.


Properties

The following properties of RKHSs may be useful to readers. * Let (X_i)_^p be a sequence of sets and (K_i)_^p be a collection of corresponding positive definite functions on (X_i)_^p. It then follows that *::K((x_1,\ldots ,x_p),(y_1,\ldots,y_p)) = K_1(x_1,y_1)\cdots K_p(x_p,y_p) *:is a kernel on X = X_1 \times \dots \times X_p. * Let X_0 \subset X, then the restriction of K to X_0 \times X_0 is also a reproducing kernel. * Consider a normalized kernel K such that K(x, x) = 1 for all x \in X . Define a pseudo-metric on X as *:: d_K(x,y) = \, K_x - K_y\, _H^2 = 2(1-K(x,y)) \qquad \forall x \in X . *:By the
Cauchy–Schwarz inequality The Cauchy–Schwarz inequality (also called Cauchy–Bunyakovsky–Schwarz inequality) is considered one of the most important and widely used inequalities in mathematics. The inequality for sums was published by . The corresponding inequality f ...
, *:: K(x,y)^2 \le K(x, x)K(y, y)=1 \qquad \forall x,y \in X. *:This inequality allows us to view K as a measure of similarity between inputs. If x, y \in X are similar then K(x,y) will be closer to 1 while if x,y \in X are dissimilar then K(x,y) will be closer to 0. *The closure of the span of \ coincides with H .


Common examples


Bilinear kernels

: K(x,y) = \langle x,y\rangle The RKHS H corresponding to this kernel is the dual space, consisting of functions f(x) = \langle x,\beta\rangle satisfying \, f\, _H^2=\, \beta\, ^2.


Polynomial kernels

: K(x,y) = (\alpha\langle x,y \rangle + 1)^d, \qquad \alpha \in \R, d \in \N


Radial basis function kernel In machine learning, the radial basis function kernel, or RBF kernel, is a popular kernel function used in various kernelized learning algorithms. In particular, it is commonly used in support vector machine classification. The RBF kernel on two ...
s

These are another common class of kernels which satisfy K(x,y) = K(\, x - y\, ). Some examples include: *Gaussian or squared exponential kernel: *:: K(x,y) = e^, \qquad \sigma > 0 * Laplacian kernel: *:: K(x,y) = e^, \qquad \sigma > 0 *:The squared norm of a function f in the RKHS H with this kernel is: *::\, f\, _H^2=\int_\Big( \frac1 f(x)^2 + \sigma f'(x)^2\Big) \mathrm d x.


Bergman kernels

We also provide examples of Bergman kernels. Let ''X'' be finite and let ''H'' consist of all complex-valued functions on ''X''. Then an element of ''H'' can be represented as an array of complex numbers. If the usual
inner product In mathematics, an inner product space (or, rarely, a Hausdorff pre-Hilbert space) is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, often ...
is used, then ''Kx'' is the function whose value is 1 at ''x'' and 0 everywhere else, and K(x,y) can be thought of as an identity matrix since :K(x,y)=\begin 1 & x=y \\ 0 & x \neq y \end In this case, ''H'' is isomorphic to \Complex^n. The case of X= \mathbb (where \mathbb denotes the unit disc) is more sophisticated. Here the
Bergman space In complex analysis, functional analysis and operator theory, a Bergman space, named after Stefan Bergman, is a function space of holomorphic functions in a domain ''D'' of the complex plane that are sufficiently well-behaved at the boundary that t ...
H^2(\mathbb) is the space of square-integrable
holomorphic function In mathematics, a holomorphic function is a complex-valued function of one or more complex variables that is complex differentiable in a neighbourhood of each point in a domain in complex coordinate space . The existence of a complex deriv ...
s on \mathbb. It can be shown that the reproducing kernel for H^2(\mathbb) is :K(x,y)=\frac\frac. Lastly, the space of band limited functions in L^2(\R) with bandwidth 2a is a RKHS with reproducing kernel :K(x,y)=\frac.


Extension to vector-valued functions

In this section we extend the definition of the RKHS to spaces of vector-valued functions as this extension is particularly important in
multi-task learning Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction ac ...
and
manifold regularization In machine learning, Manifold regularization is a technique for using the shape of a dataset to constrain the functions that should be learned on that dataset. In many machine learning problems, the data to be learned do not cover the entire inpu ...
. The main difference is that the reproducing kernel \Gamma is a symmetric function that is now a positive semi-definite ''matrix'' for every x,y in X . More formally, we define a vector-valued RKHS (vvRKHS) as a Hilbert space of functions f: X \to \mathbb^T such that for all c \in \mathbb^T and x \in X : \Gamma_xc(y) = \Gamma(x, y)c \in H \text y \in X and : \langle f, \Gamma_x c \rangle_H = f(x)^\intercal c. This second property parallels the reproducing property for the scalar-valued case. We note that this definition can also be connected to integral operators, bounded evaluation functions, and feature maps as we saw for the scalar-valued RKHS. We can equivalently define the vvRKHS as a vector-valued Hilbert space with a bounded evaluation functional and show that this implies the existence of a unique reproducing kernel by the Riesz Representation theorem. Mercer's theorem can also be extended to address the vector-valued setting and we can therefore obtain a feature map view of the vvRKHS. Lastly, it can also be shown that the closure of the span of \ coincides with H , another property similar to the scalar-valued case. We can gain intuition for the vvRKHS by taking a component-wise perspective on these spaces. In particular, we find that every vvRKHS is isometrically
isomorphic In mathematics, an isomorphism is a structure-preserving mapping between two structures of the same type that can be reversed by an inverse mapping. Two mathematical structures are isomorphic if an isomorphism exists between them. The word i ...
to a scalar-valued RKHS on a particular input space. Let \Lambda = \ . Consider the space X \times \Lambda and the corresponding reproducing kernel As noted above, the RKHS associated to this reproducing kernel is given by the closure of the span of \ where \gamma_ (y,s) = \gamma( (x,t), (y,s)) for every set of pairs The connection to the scalar-valued RKHS can then be made by the fact that every matrix-valued kernel can be identified with a kernel of the form of () via : \Gamma(x,y)_ = \gamma((x,t), (y,s)). Moreover, every kernel with the form of () defines a matrix-valued kernel with the above expression. Now letting the map D: H_\Gamma \to H_\gamma be defined as : (Df)(x,t) = \langle f(x), e_t \rangle_ where e_t is the t^\text component of the canonical basis for \mathbb^T , one can show that D is bijective and an isometry between H_\Gamma and H_\gamma . While this view of the vvRKHS can be useful in multi-task learning, this isometry does not reduce the study of the vector-valued case to that of the scalar-valued case. In fact, this isometry procedure can make both the scalar-valued kernel and the input space too difficult to work with in practice as properties of the original kernels are often lost. An important class of matrix-valued reproducing kernels are ''separable'' kernels which can factorized as the product of a scalar valued kernel and a T-dimensional symmetric positive semi-definite matrix. In light of our previous discussion these kernels are of the form : \gamma((x,t),(y,s)) = K(x,y) K_T(t,s) for all x,y in X and t,s in T . As the scalar-valued kernel encodes dependencies between the inputs, we can observe that the matrix-valued kernel encodes dependencies among both the inputs and the outputs. We lastly remark that the above theory can be further extended to spaces of functions with values in function spaces but obtaining kernels for these spaces is a more difficult task.Rosasco


Connection between RKHS with ReLU function

The ReLU function is commonly defined as f(x)=\max \ and is a mainstay in the architecture of neural networks where it is used as an activation function. One can construct a ReLU-like nonlinear function using the theory of reproducing kernel Hilbert spaces. Below, we derive this construction and show how it implies the representation power of neural networks with ReLU activations. We will work with the Hilbert space \mathcal=L^1_2(0)[0, \infty) of absolutely continuous functions with f(0) = 0 and square integrable (i.e. L_2) derivative. It has the inner product : \langle f,g \rangle_ = \int_0^\infty f'(x)g'(x) \, dx . To construct the reproducing kernel it suffices to consider a dense subspace, so let f\in C^1[0, \infty) and f(0)=0. The Fundamental Theorem of Calculus then gives : f(y)= \int_0^y f'(x) \, dx = \int_0^\infty G(x,y) f'(x) \, dx = \langle K_y,f \rangle where :G(x,y)= \begin 1, & x < y\\ 0, & \text \end and K_y'(x)= G(x,y),\ K_y(0) = 0 i.e. :K(x, y)=K_y(x)=\int_0^x G(z, y) \, dz= \begin x, & 0\leq x This implies K_y=K(\cdot, y) reproduces f. Moreover the minimum function on X\times X = [0,\infty)\times [0,\infty) has the following representations with the ReLu function: : \min(x,y) = x -\operatorname(x-y) = y - \operatorname(y-x). Using this formulation, we can apply the
representer theorem For computer science, in statistical learning theory, a representer theorem is any of several related results stating that a minimizer f^ of a regularized empirical risk functional defined over a reproducing kernel Hilbert space can be represe ...
to the RKHS, letting one prove the optimality of using ReLU activations in neural network settings.


See also

*Positive definite kernel * Mercer's theorem *Kernel trick *Kernel embedding of distributions *Representer theorem


Notes


References

*Alvarez, Mauricio, Rosasco, Lorenzo and Lawrence, Neil, “Kernels for Vector-Valued Functions: a Review,” https://arxiv.org/abs/1106.6251, June 2011. * * Berlinet, Alain and Thomas, Christine. ''Reproducing kernel Hilbert spaces in Probability and Statistics'', Kluwer Academic Publishers, 2004. * *De Vito, Ernest, Umanita, Veronica, and Villa, Silvia. "An extension of Mercer theorem to vector-valued measurable kernels," , June 2013. *Durrett, Greg. 9.520 Course Notes, Massachusetts Institute of Technology, https://www.mit.edu/~9.520/scribe-notes/class03_gdurett.pdf, February 2010. * *Okutmustur, Baver. “Reproducing Kernel Hilbert Spaces,” M.S. dissertation, Bilkent University, http://www.thesis.bilkent.edu.tr/0002953.pdf, August 2005. *Paulsen, Vern. “An introduction to the theory of reproducing kernel Hilbert spaces,” http://www.math.uh.edu/~vern/rkhs.pdf. * * Rosasco, Lorenzo and Poggio, Thomas. "A Regularization Tour of Machine Learning – MIT 9.520 Lecture Notes" Manuscript, Dec. 2014. * Wahba, Grace, ''Spline Models for Observational Data''
SIAM
1990. *{{cite journal , last1 = Zhang , first1 = Haizhang , last2 = Xu , first2 = Yuesheng , last3 = Zhang , first3 = Qinghui , year = 2012 , title = Refinement of Operator-valued Reproducing Kernels , journal = Journal of Machine Learning Research , volume = 13 , pages = 91–136 , url=http://www.jmlr.org/papers/volume13/zhang12a/zhang12a.pdf Hilbert space