Discriminative models, also referred to as conditional models, are a class of models frequently used for

classification Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...

. They are typically used to solve

binary classification Binary classification is the task of classifying the elements of a set into one of two groups (each called ''class''). Typical binary classification problems include: * Medical testing to determine if a patient has a certain disease or not; * Qual ...

problems, i.e. assign labels, such as pass/fail, win/lose, alive/dead or healthy/sick, to existing datapoints. Types of discriminative models include

logistic regression In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...

(LR), conditional random fields (CRFs),

decision trees A decision tree is a decision support system, decision support recursive partitioning structure that uses a Tree (graph theory), tree-like Causal model, model of decisions and their possible consequences, including probability, chance event ou ...

among many others.

Generative model In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsiste ...

approaches which uses a joint probability distribution instead, include

naive Bayes classifier In statistics, naive (sometimes simple or idiot's) Bayes classifiers are a family of " probabilistic classifiers" which assumes that the features are conditionally independent, given the target class. In other words, a naive Bayes model assumes th ...

Gaussian mixture model In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observati ...

s, variational autoencoders,

generative adversarial network A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June ...

s and others.

Definition

Unlike generative modelling, which studies the

joint probability A joint or articulation (or articular surface) is the connection made between bones, ossicles, or other hard structures in the body which link an animal's skeletal system into a functional whole.Saladin, Ken. Anatomy & Physiology. 7th ed. McGra ...

P(x,y)

, discriminative modeling studies the

P(y, x)

or maps the given unobserved variable (target)

x

to a class label

y

dependent on the observed variables (training samples). For example, in

object recognition Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence. Humans recognize a multitude of objects in images with little effort, despite the fact that the image of the ...

x

is likely to be a vector of raw pixels (or features extracted from the raw pixels of the image). Within a probabilistic framework, this is done by modeling the

conditional probability distribution In probability theory and statistics, the conditional probability distribution is a probability distribution that describes the probability of an outcome given the occurrence of a particular event. Given two jointly distributed random variables X ...

P(y, x)

, which can be used for predicting

y

from

x

. Note that there is still distinction between the conditional model and the discriminative model, though more often they are simply categorised as discriminative model.

Pure discriminative model vs. conditional model

A ''conditional model'' models the conditional

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

, while the traditional discriminative model aims to optimize on mapping the input around the most similar trained samples.

Typical discriminative modelling approaches

The following approach is based on the assumption that it is given the training data-set

D=\

, where

y_i

is the corresponding output for the input

x_i

Linear classifier

We intend to use the function

f(x)

to simulate the behavior of what we observed from the training data-set by the

linear classifier In machine learning, a linear classifier makes a classification decision for each object based on a linear combination of its features. Such classifiers work well for practical problems such as document classification, and more generally for prob ...

method. Using the joint feature vector

\phi(x,y)

, the decision function is defined as: :

f(x;w)=\arg \max_y w^T \phi(x,y)

According to Memisevic's interpretation,

w^T \phi(x,y)

, which is also

c(x,y;w)

, computes a score which measures the compatibility of the input

x

with the potential output

y

. Then the

\arg \max

determines the class with the highest score.

Logistic regression (LR)

Since the 0-1 loss function is a commonly used one in the decision theory, the conditional

P(y, x;w)

, where

w

is a parameter vector for optimizing the training data, could be reconsidered as following for the logistics regression model: :

P(y, x;w)= \frac \exp(w^T\phi(x,y))

, with :

Z(x;w)= \textstyle \sum_  \displaystyle\exp(w^T\phi(x,y))

The equation above represents

. Notice that a major distinction between models is their way of introducing posterior probability. Posterior probability is inferred from the parametric model. We then can maximize the parameter by following equation: :

L(w)=\textstyle \sum_ \displaystyle \log p(y^i, x^i;w)

It could also be replaced by the log-loss equation below: :

l^ (x^i, y^i,c(x^i;w)) = -\log p(y^i, x^i;w) = \log Z(x^i;w)-w^T\phi(x^i,y^i)

Since the log-loss is differentiable, a gradient-based method can be used to optimize the model. A global optimum is guaranteed because the objective function is convex. The gradient of log likelihood is represented by: :

\frac = \textstyle \sum_ \displaystyle \phi(x^i,y^i) - E_ \phi(x^i,y)

where

E_

is the expectation of

p(y, x^i;w)

. The above method will provide efficient computation for the relative small number of classification.

Contrast with generative model

Contrast in approaches

Let's say we are given the

m

class labels (classification) and

n

feature variables,

Y:\, X:\

, as the training samples. A generative model takes the joint probability

P(x,y)

, where

x

is the input and

y

is the label, and predicts the most possible known label

\widetilde\in Y

for the unknown variable

\widetilde

using

Bayes' theorem Bayes' theorem (alternatively Bayes' law or Bayes' rule, after Thomas Bayes) gives a mathematical rule for inverting Conditional probability, conditional probabilities, allowing one to find the probability of a cause given its effect. For exampl ...

. Discriminative models, as opposed to

generative model In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsiste ...

s, do not allow one to generate samples from the

joint distribution A joint or articulation (or articular surface) is the connection made between bones, ossicles, or other hard structures in the body which link an animal's skeletal system into a functional whole.Saladin, Ken. Anatomy & Physiology. 7th ed. McGraw- ...

of observed and target variables. However, for tasks such as

and

regression Regression or regressions may refer to: Arts and entertainment * ''Regression'' (film), a 2015 horror film by Alejandro Amenábar, starring Ethan Hawke and Emma Watson * ''Regression'' (magazine), an Australian punk rock fanzine (1982–1984) * ...

that do not require the joint distribution, discriminative models can yield superior performance (in part because they have fewer variables to compute). On the other hand, generative models are typically more flexible than discriminative models in expressing dependencies in complex learning tasks. In addition, most discriminative models are inherently supervised and cannot easily support

unsupervised learning Unsupervised learning is a framework in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. Other frameworks in the spectrum of supervisions include weak- or semi-supervision, wh ...

. Application-specific details ultimately dictate the suitability of selecting a discriminative versus generative model. Discriminative models and generative models also differ in introducing the posterior possibility. To maintain the least expected loss, the minimization of result's misclassification should be acquired. In the discriminative model, the posterior probabilities,

P(y, x)

, is inferred from a parametric model, where the parameters come from the training data. Points of estimation of the parameters are obtained from the maximization of likelihood or distribution computation over the parameters. On the other hand, considering that the generative models focus on the joint probability, the class posterior possibility

P(k)

is considered in

, which is :

P(y, x) = \frac=\frac

Advantages and disadvantages in application

In the repeated experiments, logistic regression and naive Bayes are applied here for different models on binary classification task, discriminative learning results in lower asymptotic errors, while generative one results in higher asymptotic errors faster. However, in Ulusoy and Bishop's joint work, ''Comparison of Generative and Discriminative Techniques for Object Detection and Classification'', they state that the above statement is true only when the model is the appropriate one for data (i.e.the data distribution is correctly modeled by the generative model).

Advantages

Significant advantages of using discriminative modeling are: * Higher accuracy, which mostly leads to better learning result. * Allows simplification of the input and provides a direct approach to

P(y, x)

* Saves calculation resource * Generates lower asymptotic errors Compared with the advantages of using generative modeling: * Takes all data into consideration, which could result in slower processing as a disadvantage * Requires fewer training samples * A flexible framework that could easily cooperate with other needs of the application

Disadvantages

* Training method usually requires multiple numerical optimization techniques * Similarly by the definition, the discriminative model will need the combination of multiple subtasks for solving a complex real-world problem

Optimizations in applications

Since both advantages and disadvantages present on the two way of modeling, combining both approaches will be a good modeling in practice. For example, in Marras' article ''A Joint Discriminative Generative Model for Deformable Model Construction and Classification'', he and his coauthors apply the combination of two modelings on face classification of the models, and receive a higher accuracy than the traditional approach. Similarly, Kelm also proposed the combination of two modelings for pixel classification in his article ''Combining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning''. During the process of extracting the discriminative features prior to the clustering,

Principal component analysis Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data is linearly transformed onto a new coordinate system such that th ...

(PCA), though commonly used, is not a necessarily discriminative approach. In contrast, LDA is a discriminative one.

Linear discriminant analysis Linear discriminant analysis (LDA), normal discriminant analysis (NDA), canonical variates analysis (CVA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to fi ...

(LDA), provides an efficient way of eliminating the disadvantage we list above. As we know, the discriminative model needs a combination of multiple subtasks before classification, and LDA provides appropriate solution towards this problem by reducing dimension.

Types

Examples of discriminative models include: *

Logistic regression In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...

, a type of generalized linear regression used for predicting

binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two values (0 and 1) for each digit * Binary function, a function that takes two arguments * Binary operation, a mathematical op ...

or categorical outputs (also known as

maximum entropy classifier In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the prob ...

s) *

Boosting (meta-algorithm) In machine learning (ML), boosting is an ensemble metaheuristic for primarily reducing bias (as opposed to variance). It can also improve the stability and accuracy of ML classification and regression algorithms. Hence, it is prevalent in supe ...

Conditional random field Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without consi ...

s *

Linear regression In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...

Random forest Random forests or random decision forests is an ensemble learning method for statistical classification, classification, regression analysis, regression and other tasks that works by creating a multitude of decision tree learning, decision trees ...

References

{{Statistics, state=expanded Regression models

Definition

Pure discriminative model vs. conditional model

Typical discriminative modelling approaches

Linear classifier

Logistic regression (LR)

Contrast with generative model

Contrast in approaches

Advantages and disadvantages in application

Advantages

Disadvantages

Optimizations in applications

Types

See also

References