In
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
, tree kernels are the application of the more general concept of
positive-definite kernel
In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solvi ...
to tree structures. They find applications in
natural language processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
, where they can be used for
machine-learned parsing
Parsing, syntax analysis, or syntactic analysis is a process of analyzing a String (computer science), string of Symbol (formal), symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal gramm ...
or classification of sentences.
Motivation
In natural language processing, it is often necessary to compare tree structures (e.g.
parse tree
A parse tree or parsing tree (also known as a derivation tree or concrete syntax tree) is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is use ...
s) for similarity. Such comparisons can be performed by computing
dot product
In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a Scalar (mathematics), scalar as a result". It is also used for other symmetric bilinear forms, for example in a pseudo-Euclidean space. N ...
s of vectors of features of the trees, but these vectors tend to be very large: NLP techniques have come to a point where a simple dependency relation over two words is encoded with a vector of several millions of features.
It can be impractical to represent complex structures such as trees with features vectors. Well-designed kernels allow computing similarity over trees without explicitly computing the feature vectors of these trees. Moreover,
kernel methods
In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pa ...
have been widely used in machine learning tasks (e.g.
SVM), and thus plenty of algorithms are working natively with kernels, or have an extension that handles
kernelization
In computer science, a kernelization is a technique for designing efficient algorithms that achieve their efficiency by a preprocessing stage in which inputs to the algorithm are replaced by a smaller input, called a "kernel". The result of solvi ...
.
An example application is classification of sentences, such as different types of questions.
Examples

Here are presented two examples of tree kernel applied to the constituency trees of the sentences "A cat eats a mouse." and "A mouse eats a cat.". In this example "A" and "a" are the same words, and in most of the NLP applications they would be represented with the same token.
The interest of these two kernels is that they show very different granularity (the subset tree kernel being far more fine-grained than the subtree kernel), for the same computational complexity. Both can be computed recursively in time ''O(, T
1, ., T
2, )''.
Subtree kernel
In the case of constituency tree, a subtree is defined as a node and all its children (e.g.,
P_[D_[A<_a>_[N_[mouse.html">P [D [A [N [mouse">_[A.html" ;"title="P [D [A">P [D [A [N [mouse">_[A">P_[D_[A<_a>_[N_[mouse.html" ;"title="_[A.html" ;"title="P [D [A">P [D [A [N [mouse">_[A.html" ;"title="P [D [A">P [D [A [N [mouseis a subtree of the two trees). Terminals are not considered subtree (e.g. [a] is not a subtree). The subtree kernel count the number of common subtrees between two given trees.
In this example, there are seven common subtrees:
:[NP [D [a [N [cat],
:[NP [D [a
[mouse">ouse.html" ;"title=" [mouse"> [mouse
:
[mouse,
:[N [cat,
:[V [eats,
:[D [a (counted twice as it appears twice).
Subset tree kernel
A subset tree is a more general structure than a subtree. The basic definition is the same, but in the case of subset trees, leaves need not be terminals (e.g.,
P [V[NP">.html" ;"title="P [V">P [V[NP is a subset tree of both trees), but here two single nodes are not considered as trees. Because of this more general definition, there are more subset trees than subtrees, and more common subset trees than common subtrees.
In this example, there are 54 common subset trees. The seven common subtrees plus among others:
:[NP [D] [N (counted twice),
:[VP [V [eats [NP...
See also
*Graph kernel
*Parse tree
Notes
{{reflist
References
*Jun Sun, Min Zhang and Chew Lim Tan. ''Tree Sequence Kernel for Natural Language''
*Alessandro Moschitti. ''Making Tree Kernels practical for Natural Language Learning''
External links
http://disi.unitn.it/moschitti/Tree-Kernel.htm-- Application of tree kernel to SVM, on Alessandro Moschitti web-page.
Operator theory
Hilbert spaces