Binary classification is the task of

classifying Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...

the elements of a

set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...

into one of two groups (each called ''class''). Typical binary classification problems include: *

Medical test A medical test is a medical procedure performed to detect, diagnose, or monitor diseases, disease processes, susceptibility, or to determine a course of treatment. Medical tests such as, physical and visual exams, diagnostic imaging, genetic ...

ing to determine if a patient has a certain disease or not; *

Quality control Quality control (QC) is a process by which entities review the quality of all factors involved in production. ISO 9000 defines quality control as "a part of quality management focused on fulfilling quality requirements". This approach plac ...

in industry, deciding whether a specification has been met; * In

information retrieval Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...

, deciding whether a page should be in the result set of a search or not * In

administration Administration may refer to: Management of organizations * Management, the act of directing people towards accomplishing a goal: the process of dealing with or controlling things or people. ** Administrative assistant, traditionally known as a se ...

, deciding whether someone should be issued with a driving licence or not * In

cognition Cognition is the "mental action or process of acquiring knowledge and understanding through thought, experience, and the senses". It encompasses all aspects of intellectual functions and processes such as: perception, attention, thought, ...

, deciding whether an object is food or not food. When measuring the accuracy of a binary classifier, the simplest way is to count the errors. But in the real world often one of the two classes is more important, so that the number of both of the different types of errors is of interest. For example, in medical testing, detecting a disease when it is not present (a '' false positive'') is considered differently from not detecting a disease when it is present (a '' false negative'').

Four outcomes

Given a classification of a specific data set, there are four basic combinations of actual data category and assigned category: true positives TP (correct positive assignments), true negatives TN (correct negative assignments), false positives FP (incorrect positive assignments), and false negatives FN (incorrect negative assignments). These can be arranged into a 2×2 contingency table, with rows corresponding to actual value – condition positive or condition negative – and columns corresponding to classification value – test outcome positive or test outcome negative.

Evaluation

From tallies of the four basic outcomes, there are many approaches that can be used to measure the accuracy of a classifier or predictor. Different fields have different preferences.

The eight basic ratios

A common approach to evaluation is to begin by computing two ratios of a standard pattern. There are eight basic ratios of this form that one can compute from the contingency table, which come in four complementary pairs (each pair summing to 1). These are obtained by dividing each of the four numbers by the sum of its row or column, yielding eight numbers, which can be referred to generically in the form "true positive row ratio" or "false negative column ratio". There are thus two pairs of column ratios and two pairs of row ratios, and one can summarize these with four numbers by choosing one ratio from each pair – the other four numbers are the complements. The row ratios are: * true positive rate (TPR) = (TP/(TP+FN)), aka sensitivity or recall. These are the proportion of the ''population with the condition'' for which the test is correct. **with complement the false negative rate (FNR) = (FN/(TP+FN)) * true negative rate (TNR) = (TN/(TN+FP), aka specificity (SPC), **with complement false positive rate (FPR) = (FP/(TN+FP)), also called independent of

prevalence In epidemiology, prevalence is the proportion of a particular population found to be affected by a medical condition (typically a disease or a risk factor such as smoking or seatbelt use) at a specific time. It is derived by comparing the number o ...

The column ratios are: * positive predictive value (PPV, aka precision) (TP/(TP+FP)). These are the proportion of the ''population with a given test result'' for which the test is correct. **with complement the false discovery rate (FDR) (FP/(TP+FP)) * negative predictive value (NPV) (TN/(TN+FN)) **with complement the false omission rate (FOR) (FN/(TN+FN)), also called dependence on prevalence. In diagnostic testing, the main ratios used are the true column ratios – true positive rate and true negative rate – where they are known as

sensitivity and specificity In medicine and statistics, sensitivity and specificity mathematically describe the accuracy of a test that reports the presence or absence of a medical condition. If individuals who have the condition are considered "positive" and those who do ...

. In informational retrieval, the main ratios are the true positive ratios (row and column) – positive predictive value and true positive rate – where they are known as

precision and recall In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space. Precision (also calle ...

. Cullerne Bown has suggested a flow chart for determining which pair of indicators should be used when. Otherwise, there is no general rule for deciding. There is also no general agreement on how the pair of indicators should be used to decide on concrete questions, such as when to prefer one classifier over another. One can take ratios of a complementary pair of ratios, yielding four likelihood ratios (two column ratio of ratios, two row ratio of ratios). This is primarily done for the column (condition) ratios, yielding likelihood ratios in diagnostic testing. Taking the ratio of one of these groups of ratios yields a final ratio, the diagnostic odds ratio (DOR). This can also be defined directly as (TP×TN)/(FP×FN) = (TP/FN)/(FP/TN); this has a useful interpretation – as an

odds ratio An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of event A taking place in the presence of B, and the odds of A in the absence of B ...

– and is prevalence-independent.

Other metrics

There are a number of other metrics, most simply the accuracy or Fraction Correct (FC), which measures the fraction of all instances that are correctly categorized; the complement is the Fraction Incorrect (FiC). The F-score combines precision and recall into one number via a choice of weighing, most simply equal weighing, as the balanced F-score ( F1 score). Some metrics come from regression coefficients: the

markedness In linguistics and social sciences, markedness is the state of standing out as nontypical or divergent as opposed to regular or common. In a marked–unmarked relation, one term of an opposition is the broader, dominant one. The dominant defau ...

and the informedness, and their

geometric mean In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite collection of positive real numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometri ...

, the Matthews correlation coefficient. Other metrics include

Youden's J statistic Youden's J statistic (also called Youden's index) is a single statistic that captures the performance of a dichotomy, dichotomous diagnostic test. In meteorology, this statistic is referred to as Peirce Skill Score (PSS), Hanssen–Kuipers Discrim ...

, the

uncertainty coefficient In statistics, the uncertainty coefficient, also called proficiency, entropy coefficient or Theil's U, is a measure of nominal Association (statistics), association. It was first introduced by Henri Theil and is based on the concept of informatio ...

, the phi coefficient, and Cohen's kappa.

Statistical binary classification

Statistical classification When classification is performed by a computer, statistical methods are normally used to develop the algorithm. Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or ''f ...

is a problem studied in

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

in which the classification is performed on the basis of a classification rule. It is a type of

supervised learning In machine learning, supervised learning (SL) is a paradigm where a Statistical model, model is trained using input objects (e.g. a vector of predictor variables) and desired output values (also known as a ''supervisory signal''), which are often ...

, a method of machine learning where the categories are predefined, and is used to categorize new probabilistic observations into said categories. When there are only two categories the problem is known as statistical binary classification. Some of the methods commonly used for binary classification are: * Decision trees * Random forests * Bayesian networks *

Support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laborato ...

s *

Neural networks A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...

Logistic regression In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...

* Probit model *

Genetic Programming Genetic programming (GP) is an evolutionary algorithm, an artificial intelligence technique mimicking natural evolution, which operates on a population of programs. It applies the genetic operators selection (evolutionary algorithm), selection a ...

* Multi expression programming * Linear genetic programming Each classifier is best in only a select domain based upon the number of observations, the dimensionality of the feature vector, the noise in the data and many other factors. For example, random forests perform better than SVM classifiers for 3D point clouds.

Converting continuous values to binary

Binary classification may be a form of dichotomization in which a continuous function is transformed into a binary variable. Tests whose results are of continuous values, such as most blood values, can artificially be made binary by defining a cutoff value, with test results being designated as positive or negative depending on whether the resultant value is higher or lower than the cutoff. However, such conversion causes a loss of information, as the resultant binary classification does not tell ''how much'' above or below the cutoff a value is. As a result, when converting a continuous value that is close to the cutoff to a binary one, the resultant positive or negative predictive value is generally higher than the predictive value given directly from the continuous value. In such cases, the designation of the test of being either positive or negative gives the appearance of an inappropriately high certainty, while the value is in fact in an interval of uncertainty. For example, with the urine concentration of hCG as a continuous value, a urine pregnancy test that measured 52 mIU/ml of hCG may show as "positive" with 50 mIU/ml as cutoff, but is in fact in an interval of uncertainty, which may be apparent only by knowing the original continuous value. On the other hand, a test result very far from the cutoff generally has a resultant positive or negative predictive value that is lower than the predictive value given from the continuous value. For example, a urine hCG value of 200,000 mIU/ml confers a very high probability of pregnancy, but conversion to binary values results in that it shows just as "positive" as the one of 52 mIU/ml.

References

Bibliography

Nello Cristianini Nello Cristianini (born 1968) is a professor of Artificial Intelligence in the Department of Computer Science at the University of Bath. Education Cristianini holds a degree in physics from the University of Trieste, a Master in computational ...

and John Shawe-Taylor. ''An Introduction to Support Vector Machines and other kernel-based learning methods''. Cambridge University Press, 2000. '

SVM Book)'' * John Shawe-Taylor and Nello Cristianini. ''Kernel Methods for Pattern Analysis''. Cambridge University Press, 2004.
Website for the book
* Bernhard Schölkopf and A. J. Smola: ''Learning with Kernels''. MIT Press, Cambridge, Massachusetts, 2002. {{Statistics, analysis, , state=expanded Statistical classification Machine learning