statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...

, a quadratic classifier is a statistical classifier that uses a quadratic decision surface to separate measurements of two or more classes of objects or events. It is a more general version of the

linear classifier In machine learning, a linear classifier makes a classification decision for each object based on a linear combination of its features. Such classifiers work well for practical problems such as document classification, and more generally for prob ...

The classification problem

Statistical classification When classification is performed by a computer, statistical methods are normally used to develop the algorithm. Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or ''f ...

considers a set of vectors of observations of an object or event, each of which has a known type . This set is referred to as the training set. The problem is then to determine, for a given new observation vector, what the best class should be. For a quadratic classifier, the correct solution is assumed to be quadratic in the measurements, so will be decided based on

\mathbf + \mathbf + c

In the special case where each observation consists of two measurements, this means that the surfaces separating the classes will be

conic sections A conic section, conic or a quadratic curve is a curve obtained from a Conical surface, cone's surface intersecting a plane (mathematics), plane. The three types of conic section are the hyperbola, the parabola, and the ellipse; the circle is ...

(i.e., either a line, a

circle A circle is a shape consisting of all point (geometry), points in a plane (mathematics), plane that are at a given distance from a given point, the Centre (geometry), centre. The distance between any point of the circle and the centre is cal ...

ellipse In mathematics, an ellipse is a plane curve surrounding two focus (geometry), focal points, such that for all points on the curve, the sum of the two distances to the focal points is a constant. It generalizes a circle, which is the special ty ...

, a

parabola In mathematics, a parabola is a plane curve which is Reflection symmetry, mirror-symmetrical and is approximately U-shaped. It fits several superficially different Mathematics, mathematical descriptions, which can all be proved to define exactl ...

or a

hyperbola In mathematics, a hyperbola is a type of smooth function, smooth plane curve, curve lying in a plane, defined by its geometric properties or by equations for which it is the solution set. A hyperbola has two pieces, called connected component ( ...

). In this sense, we can state that a quadratic model is a generalization of the linear model, and its use is justified by the desire to extend the classifier's ability to represent more complex separating surfaces.

Quadratic discriminant analysis

Quadratic discriminant analysis (QDA) is closely related to

linear discriminant analysis Linear discriminant analysis (LDA), normal discriminant analysis (NDA), canonical variates analysis (CVA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to fi ...

(LDA), where it is assumed that the measurements from each class are

normally distributed In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real number, real-valued random variable. The general form of its probability density function is f(x ...

. Unlike LDA however, in QDA there is no assumption that the

covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables. If greater values of one ...

of each of the classes is identical. When the normality assumption is true, the best possible test for the hypothesis that a given measurement is from a given class is the likelihood ratio test. Suppose there are only two groups, with means

\mu_0, \mu_1

and covariance matrices

\Sigma_0, \Sigma_1

corresponding to

y = 0

and

y = 1

respectively. Then the likelihood ratio is given by

\text
= \frac
< t

for some threshold

t

. After some rearrangement, it can be shown that the resulting separating surface between the classes is a quadratic. The sample estimates of the mean vector and variance-covariance matrices will substitute the population quantities in this formula.

Other

While QDA is the most commonly-used method for obtaining a classifier, other methods are also possible. One such method is to create a longer measurement vector from the old one by adding all pairwise products of individual measurements. For instance, the vector

_1, \; x_2, \; x_3

would become

_1, \; x_2, \; x_3, \; x_1^2, \; x_1x_2, \; x_1 x_3, \; x_2^2, \; x_2x_3, \; x_3^2

Finding a quadratic classifier for the original measurements would then become the same as finding a linear classifier based on the expanded measurement vector. This observation has been used in extending neural network models; the "circular" case, which corresponds to introducing only the sum of pure quadratic terms

\; x_1^2 + x_2^2 + x_3^2+\cdots  \;

with no mixed products (

\; x_1 x_2, \; x_1 x_3,  \; \ldots  \;

), has been proven to be the optimal compromise between extending the classifier's representation power and controlling the risk of overfitting ( Vapnik-Chervonenkis dimension). For linear classifiers based only on

dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a Scalar (mathematics), scalar as a result". It is also used for other symmetric bilinear forms, for example in a pseudo-Euclidean space. N ...

s, these expanded measurements do not have to be actually computed, since the dot product in the higher-dimensional space is simply related to that in the original space. This is an example of the so-called

kernel trick In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pa ...

, which can be applied to linear discriminant analysis as well as the

support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laborato ...

References

Citations

General references

* {{DEFAULTSORT:Quadratic Classifier Classification algorithms Statistical classification