In
statistics, a quadratic classifier is a
statistical classifier that uses a
quadratic
In mathematics, the term quadratic describes something that pertains to squares, to the operation of squaring, to terms of the second degree, or equations or formulas that involve such terms. ''Quadratus'' is Latin for ''square''.
Mathematics ...
decision surface to separate measurements of two or more classes of objects or events. It is a more general version of the
linear classifier.
The classification problem
Statistical classification
In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diag ...
considers a set of
vectors of observations x of an object or event, each of which has a known type ''y''. This set is referred to as the
training set
In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from ...
. The problem is then to determine, for a given new observation vector, what the best class should be. For a quadratic classifier, the correct solution is assumed to be quadratic in the measurements, so ''y'' will be decided based on
:
In the special case where each observation consists of two measurements, this means that the surfaces separating the classes will be
conic sections
In mathematics, a conic section, quadratic curve or conic is a curve obtained as the intersection of the surface of a cone with a plane. The three types of conic section are the hyperbola, the parabola, and the ellipse; the circle is a s ...
(''i.e.'' either a
line
Line most often refers to:
* Line (geometry), object with zero thickness and curvature that stretches to infinity
* Telephone line, a single-user circuit on a telephone communication system
Line, lines, The Line, or LINE may also refer to:
Art ...
, a
circle
A circle is a shape consisting of all points in a plane that are at a given distance from a given point, the centre. Equivalently, it is the curve traced out by a point that moves in a plane so that its distance from a given point is const ...
or
ellipse, a
parabola
In mathematics, a parabola is a plane curve which is mirror-symmetrical and is approximately U-shaped. It fits several superficially different mathematical descriptions, which can all be proved to define exactly the same curves.
One descri ...
or a
hyperbola
In mathematics, a hyperbola (; pl. hyperbolas or hyperbolae ; adj. hyperbolic ) is a type of smooth curve lying in a plane, defined by its geometric properties or by equations for which it is the solution set. A hyperbola has two pieces, c ...
). In this sense, we can state that a quadratic model is a generalization of the linear model, and its use is justified by the desire to extend the classifier's ability to represent more complex separating surfaces.
Quadratic discriminant analysis
Quadratic discriminant analysis (QDA) is closely related to
linear discriminant analysis
Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...
(LDA), where it is assumed that the measurements from each class are
normally distributed
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu is ...
. Unlike LDA however, in QDA there is no assumption that the
covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
of each of the classes is identical. When the normality assumption is true, the best possible test for the hypothesis that a given measurement is from a given class is the
likelihood ratio test. Suppose there are only two groups, with means
and covariance matrices
corresponding to
and
respectively. Then the likelihood ratio is given by
:
for some threshold
. After some rearrangement, it can be shown that the resulting separating surface between the classes is a quadratic. The sample estimates of the mean vector and variance-covariance matrices will substitute the population quantities in this formula.
Other
While QDA is the most commonly-used method for obtaining a classifier, other methods are also possible. One such method is to create a longer measurement vector from the old one by adding all pairwise products of individual measurements. For instance, the vector
:
would become
:
Finding a quadratic classifier for the original measurements would then become the same as finding a linear classifier based on the expanded measurement vector. This observation has been used in extending neural network models; the "circular" case, which corresponds to introducing only the sum of pure quadratic terms
with no mixed products (
), has been proven to be the optimal compromise between extending the classifier's representation power and controlling the risk of overfitting (
Vapnik-Chervonenkis dimension).
For linear classifiers based only on
dot product
In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used sometimes for other symmetric bilinear forms, for example in a pseudo-Euclidean space. is an alg ...
s, these expanded measurements do not have to be actually computed, since the dot product in the higher-dimensional space is simply related to that in the original space. This is an example of the so-called
kernel trick
In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example c ...
, which can be applied to linear discriminant analysis as well as the
support vector machine
In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories ...
.
References
Sources:
*
{{DEFAULTSORT:Quadratic Classifier
Classification algorithms
Statistical classification