
In
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
and
mathematical optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
, loss functions for classification are computationally feasible
loss functions
In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
representing the price paid for inaccuracy of predictions in
classification problems (problems of identifying which category a particular observation belongs to).
Given
as the space of all possible inputs (usually
), and
as the set of labels (possible outputs), a typical goal of classification algorithms is to find a function
which best predicts a label
for a given input
.
However, because of incomplete information, noise in the measurement, or probabilistic components in the underlying process, it is possible for the same
to generate different
.
As a result, the goal of the learning problem is to minimize expected loss (also known as the risk), defined as
:
where
is a given loss function, and
is the
probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
of the process that generated the data, which can equivalently be written as
:
Within classification, several commonly used
loss functions
In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
are written solely in terms of the product of the true label
and the predicted label
. Therefore, they can be defined as functions of only one variable
, so that
with a suitably chosen function
. These are called margin-based loss functions. Choosing a margin-based loss function amounts to choosing
. Selection of a loss function within this framework impacts the optimal
which minimizes the expected risk, see
empirical risk minimization
In statistical learning theory, the principle of empirical risk minimization defines a family of learning algorithms based on evaluating performance over a known and fixed dataset. The core idea is based on an application of the law of large num ...
.
In the case of binary classification, it is possible to simplify the calculation of expected risk from the integral specified above. Specifically,
:
The second equality follows from the properties described above. The third equality follows from the fact that 1 and −1 are the only possible values for
, and the fourth because
. The term within brackets
is known as the ''conditional risk.''
One can solve for the minimizer of