Statistical learning theory is a framework for
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
drawing from the fields of
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
and
functional analysis
Functional analysis is a branch of mathematical analysis, the core of which is formed by the study of vector spaces endowed with some kind of limit-related structure (for example, Inner product space#Definition, inner product, Norm (mathematics ...
. Statistical learning theory deals with the
statistical inference problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as
computer vision
Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
,
speech recognition, and
bioinformatics.
Introduction
The goals of learning are understanding and prediction. Learning falls into many categories, including
supervised learning,
unsupervised learning,
online learning, and
reinforcement learning. From the perspective of statistical learning theory, supervised learning is best understood. Supervised learning involves learning from a
training set of data. Every point in the training is an input–output pair, where the input maps to an output. The learning problem consists of inferring the function that maps between the input and the output, such that the learned function can be used to predict the output from future input.
Depending on the type of output, supervised learning problems are either problems of
regression or problems of
classification
Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...
. If the output takes a continuous range of values, it is a regression problem. Using
Ohm's law as an example, a regression could be performed with voltage as input and current as an output. The regression would find the functional relationship between voltage and current to be such that
Classification problems are those for which the output will be an element from a discrete set of labels. Classification is very common for machine learning applications. In
facial recognition, for instance, a picture of a person's face would be the input, and the output label would be that person's name. The input would be represented by a large multidimensional vector whose elements represent pixels in the picture.
After learning a function based on the training set data, that function is validated on a test set of data, data that did not appear in the training set.
Formal description
Take
to be the
vector space of all possible inputs, and
to be the vector space of all possible outputs. Statistical learning theory takes the perspective that there is some unknown
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
over the product space
, i.e. there exists some unknown
. The training set is made up of
samples from this probability distribution, and is notated
Every
is an input vector from the training data, and
is the output that corresponds to it.
In this formalism, the inference problem consists of finding a function
such that
. Let
be a space of functions
called the hypothesis space. The hypothesis space is the space of functions the algorithm will search through. Let
be the
loss function, a metric for the difference between the predicted value
and the actual value
. The
expected risk is defined to be
The target function, the best possible function
that can be chosen, is given by the
that satisfies