
In
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
and
mathematical optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
, loss functions for classification are computationally feasible
loss functions
In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...
representing the price paid for inaccuracy of predictions in
classification problems (problems of identifying which category a particular observation belongs to).
Given
as the space of all possible inputs (usually
), and
as the set of labels (possible outputs), a typical goal of classification algorithms is to find a function
which best predicts a label
for a given input
.
However, because of incomplete information, noise in the measurement, or probabilistic components in the underlying process, it is possible for the same
to generate different
.
As a result, the goal of the learning problem is to minimize expected loss (also known as the risk), defined as
:
where
is a given loss function, and
is the
probability density function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) c ...
of the process that generated the data, which can equivalently be written as
:
Within classification, several commonly used
loss functions
In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...
are written solely in terms of the product of the true label
and the predicted label
. Therefore, they can be defined as functions of only one variable
, so that
with a suitably chosen function
. These are called margin-based loss functions. Choosing a margin-based loss function amounts to choosing
. Selection of a loss function within this framework impacts the optimal
which minimizes the expected risk.
In the case of binary classification, it is possible to simplify the calculation of expected risk from the integral specified above. Specifically,
:
The second equality follows from the properties described above. The third equality follows from the fact that 1 and −1 are the only possible values for
, and the fourth because
. The term within brackets
is known as the ''conditional risk.''
One can solve for the minimizer of