Binary classification is the task of
classifying
Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...
the elements of a
set
Set, The Set, SET or SETS may refer to:
Science, technology, and mathematics Mathematics
*Set (mathematics), a collection of elements
*Category of sets, the category whose objects and morphisms are sets and total functions, respectively
Electro ...
into one of two groups (each called ''class''). Typical binary classification problems include:
*
Medical test
A medical test is a medical procedure performed to detect, diagnose, or monitor diseases, disease processes, susceptibility, or to determine a course of treatment. Medical tests such as, physical and visual exams, diagnostic imaging, genetic ...
ing to determine if a patient has a certain disease or not;
*
Quality control
Quality control (QC) is a process by which entities review the quality of all factors involved in production. ISO 9000 defines quality control as "a part of quality management focused on fulfilling quality requirements".
This approach plac ...
in industry, deciding whether a specification has been met;
* In
information retrieval
Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
, deciding whether a page should be in the
result set of a search or not
* In
administration
Administration may refer to:
Management of organizations
* Management, the act of directing people towards accomplishing a goal: the process of dealing with or controlling things or people.
** Administrative assistant, traditionally known as a se ...
, deciding whether someone should be issued with a driving licence or not
* In
cognition
Cognition is the "mental action or process of acquiring knowledge and understanding through thought, experience, and the senses". It encompasses all aspects of intellectual functions and processes such as: perception, attention, thought, ...
, deciding whether an object is food or not food.
When measuring the accuracy of a binary classifier, the simplest way is to count the errors. But in the real world often one of the two classes is more important, so that the number of both of the different
types of errors is of interest. For example, in medical testing, detecting a disease when it is not present (a ''
false positive'') is considered differently from not detecting a disease when it is present (a ''
false negative'').
Four outcomes
Given a classification of a specific data set, there are four basic combinations of actual data category and assigned category:
true positives TP (correct positive assignments),
true negatives TN (correct negative assignments),
false positives FP (incorrect positive assignments), and
false negatives FN (incorrect negative assignments).
These can be arranged into a 2×2
contingency table, with rows corresponding to actual value – condition positive or condition negative – and columns corresponding to classification value – test outcome positive or test outcome negative.
Evaluation
From tallies of the four basic outcomes, there are many approaches that can be used to measure the accuracy of a classifier or predictor. Different fields have different preferences.
The eight basic ratios
A common approach to evaluation is to begin by computing two ratios of a standard pattern. There are eight basic ratios of this form that one can compute from the contingency table, which come in four complementary pairs (each pair summing to 1). These are obtained by dividing each of the four numbers by the sum of its row or column, yielding eight numbers, which can be referred to generically in the form "true positive row ratio" or "false negative column ratio".
There are thus two pairs of column ratios and two pairs of row ratios, and one can summarize these with four numbers by choosing one ratio from each pair – the other four numbers are the complements.
The row ratios are:
*
true positive rate (TPR) = (TP/(TP+FN)), aka
sensitivity or
recall. These are the proportion of the ''population with the condition'' for which the test is correct.
**with complement the
false negative rate (FNR) = (FN/(TP+FN))
*
true negative rate (TNR) = (TN/(TN+FP), aka
specificity (SPC),
**with complement
false positive rate (FPR) = (FP/(TN+FP)), also called independent of
prevalence
In epidemiology, prevalence is the proportion of a particular population found to be affected by a medical condition (typically a disease or a risk factor such as smoking or seatbelt use) at a specific time. It is derived by comparing the number o ...
The column ratios are:
*
positive predictive value (PPV, aka
precision) (TP/(TP+FP)). These are the proportion of the ''population with a given test result'' for which the test is correct.
**with complement the
false discovery rate (FDR) (FP/(TP+FP))
*
negative predictive value (NPV) (TN/(TN+FN))
**with complement the
false omission rate (FOR) (FN/(TN+FN)), also called dependence on prevalence.
In diagnostic testing, the main ratios used are the true column ratios – true positive rate and true negative rate – where they are known as
sensitivity and specificity
In medicine and statistics, sensitivity and specificity mathematically describe the accuracy of a test that reports the presence or absence of a medical condition. If individuals who have the condition are considered "positive" and those who do ...
. In informational retrieval, the main ratios are the true positive ratios (row and column) – positive predictive value and true positive rate – where they are known as
precision and recall
In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space.
Precision (also calle ...
.
Cullerne Bown has suggested a flow chart for determining which pair of indicators should be used when.
[
] Otherwise, there is no general rule for deciding. There is also no general agreement on how the pair of indicators should be used to decide on concrete questions, such as when to prefer one classifier over another.
One can take ratios of a complementary pair of ratios, yielding four
likelihood ratios (two column ratio of ratios, two row ratio of ratios). This is primarily done for the column (condition) ratios, yielding
likelihood ratios in diagnostic testing. Taking the ratio of one of these groups of ratios yields a final ratio, the
diagnostic odds ratio (DOR). This can also be defined directly as (TP×TN)/(FP×FN) = (TP/FN)/(FP/TN); this has a useful interpretation – as an
odds ratio
An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of event A taking place in the presence of B, and the odds of A in the absence of B ...
– and is prevalence-independent.
Other metrics
There are a number of other metrics, most simply the
accuracy or Fraction Correct (FC), which measures the fraction of all instances that are correctly categorized; the complement is the Fraction Incorrect (FiC). The
F-score combines precision and recall into one number via a choice of weighing, most simply equal weighing, as the balanced F-score (
F1 score). Some metrics come from
regression coefficients: the
markedness
In linguistics and social sciences, markedness is the state of standing out as nontypical or divergent as opposed to regular or common. In a marked–unmarked relation, one term of an opposition is the broader, dominant one. The dominant defau ...
and the
informedness, and their
geometric mean
In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite collection of positive real numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometri ...
, the
Matthews correlation coefficient. Other metrics include
Youden's J statistic
Youden's J statistic (also called Youden's index) is a single statistic that captures the performance of a dichotomy, dichotomous diagnostic test. In meteorology, this statistic is referred to as Peirce Skill Score (PSS), Hanssen–Kuipers Discrim ...
, the
uncertainty coefficient
In statistics, the uncertainty coefficient, also called proficiency, entropy coefficient or Theil's U, is a measure of nominal Association (statistics), association. It was first introduced by Henri Theil and is based on the concept of informatio ...
, the
phi coefficient, and
Cohen's kappa.
Statistical binary classification
Statistical classification
When classification is performed by a computer, statistical methods are normally used to develop the algorithm.
Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or ''f ...
is a problem studied in
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
in which the classification is performed on the basis of a
classification rule. It is a type of
supervised learning
In machine learning, supervised learning (SL) is a paradigm where a Statistical model, model is trained using input objects (e.g. a vector of predictor variables) and desired output values (also known as a ''supervisory signal''), which are often ...
, a method of machine learning where the categories are predefined, and is used to categorize new probabilistic observations into said categories. When there are only two categories the problem is known as statistical binary classification.
Some of the methods commonly used for binary classification are:
*
Decision trees
*
Random forests
*
Bayesian networks
*
Support vector machine
In machine learning, support vector machines (SVMs, also support vector networks) are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laborato ...
s
*
Neural networks
A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...
*
Logistic regression
In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...
*
Probit model
*
Genetic Programming
Genetic programming (GP) is an evolutionary algorithm, an artificial intelligence technique mimicking natural evolution, which operates on a population of programs. It applies the genetic operators selection (evolutionary algorithm), selection a ...
*
Multi expression programming
*
Linear genetic programming
Each classifier is best in only a select domain based upon the number of observations, the dimensionality of the
feature vector, the noise in the data and many other factors. For example,
random forests perform better than
SVM classifiers for 3D point clouds.
Converting continuous values to binary
Binary classification may be a form of
dichotomization in which a continuous function is transformed into a binary variable. Tests whose results are of continuous values, such as most
blood values, can artificially be made binary by defining a
cutoff value, with test results being designated as
positive or negative depending on whether the resultant value is higher or lower than the cutoff.
However, such conversion causes a loss of information, as the resultant binary classification does not tell ''how much'' above or below the cutoff a value is. As a result, when converting a continuous value that is close to the cutoff to a binary one, the resultant
positive or
negative predictive value is generally higher than the
predictive value given directly from the continuous value. In such cases, the designation of the test of being either positive or negative gives the appearance of an inappropriately high certainty, while the value is in fact in an interval of uncertainty. For example, with the urine concentration of
hCG as a continuous value, a urine
pregnancy test that measured 52 mIU/ml of hCG may show as "positive" with 50 mIU/ml as cutoff, but is in fact in an interval of uncertainty, which may be apparent only by knowing the original continuous value. On the other hand, a test result very far from the cutoff generally has a resultant positive or negative predictive value that is lower than the predictive value given from the continuous value. For example, a urine hCG value of 200,000 mIU/ml confers a very high probability of pregnancy, but conversion to binary values results in that it shows just as "positive" as the one of 52 mIU/ml.
See also
*
Approximate membership query filter
*
Examples of Bayesian inference
*
Classification rule
*
Confusion matrix
*
Detection theory
*
Kernel methods
In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pa ...
*
Multiclass classification
*
Multi-label classification
*
One-class classification
*
Prosecutor's fallacy
*
Receiver operating characteristic
*
Thresholding (image processing)
*
Uncertainty coefficient
In statistics, the uncertainty coefficient, also called proficiency, entropy coefficient or Theil's U, is a measure of nominal Association (statistics), association. It was first introduced by Henri Theil and is based on the concept of informatio ...
, aka proficiency
*
Qualitative property
Qualitative properties are properties that are observed and can generally not be measured with a numerical result, unlike quantitative properties, which have numerical characteristics.
Description
Qualitative properties are properties that are ...
*
Precision and recall
In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space.
Precision (also calle ...
(equivalent classification schema)
References
Bibliography
*
Nello Cristianini
Nello Cristianini (born 1968) is a professor of Artificial Intelligence in the Department of Computer Science at the University of Bath.
Education
Cristianini holds a degree in physics from the University of Trieste, a Master in computational ...
and
John Shawe-Taylor. ''An Introduction to Support Vector Machines and other kernel-based learning methods''. Cambridge University Press, 2000. '
SVM Book)''
* John Shawe-Taylor and Nello Cristianini. ''Kernel Methods for Pattern Analysis''. Cambridge University Press, 2004.
Website for the book
* Bernhard Schölkopf and A. J. Smola: ''Learning with Kernels''. MIT Press, Cambridge, Massachusetts, 2002.
{{Statistics, analysis, , state=expanded
Statistical classification
Machine learning