Youden's J statistic (also called Youden's index) is a single statistic that captures the performance of a
dichotomous
A dichotomy () is a partition of a set, partition of a whole (or a set) into two parts (subsets). In other words, this couple of parts must be
* jointly exhaustive: everything must belong to one part or the other, and
* mutually exclusive: nothi ...
diagnostic test. In
meteorology
Meteorology is the scientific study of the Earth's atmosphere and short-term atmospheric phenomena (i.e. weather), with a focus on weather forecasting. It has applications in the military, aviation, energy production, transport, agricultur ...
, this statistic is referred to as Peirce Skill Score (PSS), Hanssen–Kuipers Discriminant (HKD), or True Skill Statistic (TSS).
(Bookmaker) Informedness is its generalization to the multiclass case and estimates the probability of an
informed decision.
Definition
Youden's ''J'' statistic is
:
with the two right-hand quantities being
sensitivity and specificity
In medicine and statistics, sensitivity and specificity mathematically describe the accuracy of a test that reports the presence or absence of a medical condition. If individuals who have the condition are considered "positive" and those who do ...
. Thus the expanded formula is:
:
In this equation, TP is the number of true positives, TN the number of true negatives, FP the number of false positives and FN the number of false negatives.
The index was suggested by
W. J. Youden in 1950
as a way of summarising the performance of a diagnostic test; however, the formula was earlier published in ''Science'' by
C. S. Peirce in 1884. Its value ranges from -1 through 1 (inclusive),
and has a zero value when a diagnostic test gives the same proportion of positive results for groups with and without the disease, i.e the test is useless. A value of 1 indicates that there are no false positives or false negatives, i.e. the test is perfect. The index gives equal weight to false positive and false negative values, so all tests with the same value of the index give the same proportion of total misclassified results. While it is possible to obtain a value of less than zero from this equation, e.g. Classification yields only False Positives and False Negatives, a value of less than zero just indicates that the positive and negative labels have been switched. After correcting the labels the result will then be in the 0 through 1 range.

Youden's index is often used in conjunction with
receiver operating characteristic
A receiver operating characteristic curve, or ROC curve, is a graph of a function, graphical plot that illustrates the performance of a binary classifier model (can be used for multi class classification as well) at varying threshold values. ROC ...
(ROC) analysis.
The index is defined for all points of an ROC curve, and the maximum value of the index may be used as a criterion for selecting the optimum cut-off point when a diagnostic test gives a numeric rather than a dichotomous result. The index is represented graphically as the height above the chance line, and it is also equivalent to the area under the curve subtended by a single operating point.
Youden's index is also known as deltaP'
and generalizes from the dichotomous to the multiclass case as informedness.
The use of a single index is "not generally to be recommended",
[Everitt B.S. (2002) The Cambridge Dictionary of Statistics. CUP ] but informedness or Youden's index is the probability of an informed decision (as opposed to a random guess) and takes into account all predictions.
An unrelated but commonly used combination of basic statistics from
information retrieval
Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
is the
F-score, being a (possibly weighted) harmonic mean of
recall and precision where
recall =
sensitivity = true positive rate. But
specificity and
precision are totally different measures. F-score, like recall and precision, only considers the so-called positive predictions, with recall being the probability of predicting just the positive class, precision being the probability of a positive prediction being correct, and F-score equating these probabilities under the effective assumption that the positive labels and the positive predictions should have the same distribution and
prevalence
In epidemiology, prevalence is the proportion of a particular population found to be affected by a medical condition (typically a disease or a risk factor such as smoking or seatbelt use) at a specific time. It is derived by comparing the number o ...
,
similar to the assumption underlying of
Fleiss' kappa. Youden's J, Informedness, Recall, Precision and F-score are intrinsically undirectional, aiming to assess the
deductive
Deductive reasoning is the process of drawing valid inferences. An inference is valid if its conclusion follows logically from its premises, meaning that it is impossible for the premises to be true and the conclusion to be false. For example, th ...
effectiveness of predictions in the direction proposed by a rule, theory or classifier. DeltaP is Youden's J used to assess the reverse or
abductive direction,
(and generalizes to the multiclass case as
Markedness
In linguistics and social sciences, markedness is the state of standing out as nontypical or divergent as opposed to regular or common. In a marked–unmarked relation, one term of an opposition is the broader, dominant one. The dominant defau ...
), matching well human learning of
associations; rules and,
superstition
A superstition is any belief or practice considered by non-practitioners to be irrational or supernatural, attributed to fate or magic (supernatural), magic, perceived supernatural influence, or fear of that which is unknown. It is commonly app ...
s as we model possible
causation;,
while correlation and kappa evaluate bidirectionally.
Matthews correlation coefficient
In statistics, the phi coefficient, or mean square contingency coefficient, denoted by ''φ'' or ''r'φ'', is a measure of association for two binary variables.
In machine learning, it is known as the Matthews correlation coefficient (MCC) an ...
is the
geometric mean
In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite collection of positive real numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum). The geometri ...
of the
regression coefficient of the dichotomous problem and its
dual, where the component regression coefficients of the Matthews correlation coefficient are deltaP and deltaP' (that is Youden's J or Pierce's I).
The main article on
Matthews correlation coefficient
In statistics, the phi coefficient, or mean square contingency coefficient, denoted by ''φ'' or ''r'φ'', is a measure of association for two binary variables.
In machine learning, it is known as the Matthews correlation coefficient (MCC) an ...
discusses two different generalizations to the multiclass case, one being the analogous geometric mean of Informedness and Markedness.
Kappa statistics such as
Fleiss' kappa and
Cohen's kappa are methods for calculating
inter-rater reliability
In statistics, inter-rater reliability (also called by various similar names, such as inter-rater agreement, inter-rater concordance, inter-observer reliability, inter-coder reliability, and so on) is the degree of agreement among independent obse ...
based on different assumptions about the marginal or prior distributions, and are increasingly used as ''chance corrected'' alternatives to
accuracy
Accuracy and precision are two measures of ''observational error''.
''Accuracy'' is how close a given set of measurements (observations or readings) are to their ''true value''.
''Precision'' is how close the measurements are to each other.
The ...
in other contexts (including the multiclass case). Fleiss' kappa, like F-score, assumes that both variables are drawn from the same distribution and thus have the same expected prevalence, while
Cohen's kappa assumes that the variables are drawn from distinct distributions and referenced to a model of
expectation that assumes
prevalence
In epidemiology, prevalence is the proportion of a particular population found to be affected by a medical condition (typically a disease or a risk factor such as smoking or seatbelt use) at a specific time. It is derived by comparing the number o ...
s are independent.
[{{cite conference , first=David M W , last=Powers , date=2012 , title=The Problem with Kappa , conference=Conference of the European Chapter of the Association for Computational Linguistics , pages=345–355 , hdl=2328/27160 ]
When the true
prevalence
In epidemiology, prevalence is the proportion of a particular population found to be affected by a medical condition (typically a disease or a risk factor such as smoking or seatbelt use) at a specific time. It is derived by comparing the number o ...
s for the two positive variables are equal as assumed in Fleiss kappa and F-score, that is the number of positive predictions matches the number of positive classes in the dichotomous (two class) case, the different kappa and correlation measure collapse to identity with Youden's J, and recall, precision and F-score are similarly identical with
accuracy
Accuracy and precision are two measures of ''observational error''.
''Accuracy'' is how close a given set of measurements (observations or readings) are to their ''true value''.
''Precision'' is how close the measurements are to each other.
The ...
.
References
Statistical classification
Biostatistics