There are two main uses of the term calibration in
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
that denote special types of
statistical inference problems. Calibration can mean
:*a reverse process to
regression, where instead of a future dependent variable being predicted from known explanatory variables, a known observation of the dependent variables is used to predict a corresponding explanatory variable;
:*procedures in
statistical classification
When classification is performed by a computer, statistical methods are normally used to develop the algorithm.
Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or ''f ...
to determine
class membership probabilities which assess the uncertainty of a given new observation belonging to each of the already established classes.
In addition, calibration is used in statistics with the usual general meaning of
calibration. For example, model calibration can be also used to refer to
Bayesian inference about the value of a model's parameters, given some data set, or more generally to any type of fitting of a
statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
. As
Philip Dawid puts it, "a forecaster is ''well calibrated'' if, for example, of those events to which he assigns a probability 30 percent, the long-run proportion that actually occurs turns out to be 30 percent."
In classification
Calibration in
classification
Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...
means transforming classifier scores into
class membership probabilities. An overview of calibration methods for
two-class and
multi-class classification tasks is given by Gebel (2009).
A classifier might separate the classes well, but be poorly calibrated, meaning that the estimated class probabilities are far from the true class probabilities. In this case, a calibration step may help improve the estimated probabilities. A variety of metrics exist that are aimed to measure the extent to which a classifier produces well-calibrated probabilities. Foundational work includes the Expected Calibration Error (ECE). Into the 2020s, variants include the Adaptive Calibration Error (ACE) and the Test-based Calibration Error (TCE), which address limitations of the ECE metric that may arise when classifier scores concentrate on narrow subset of the
,1range.
A 2020s advancement in calibration assessment is the introduction of the Estimated Calibration Index (ECI). The ECI extends the concepts of the Expected Calibration Error (ECE) to provide a more nuanced measure of a model's calibration, particularly addressing overconfidence and underconfidence tendencies. Originally formulated for binary settings, the ECI has been adapted for multiclass settings, offering both local and global insights into model calibration. This framework aims to overcome some of the theoretical and interpretative limitations of existing calibration metrics. Through a series of experiments, Famiglini et al. demonstrate the framework's effectiveness in delivering a more accurate understanding of model calibration levels and discuss strategies for mitigating biases in calibration assessment. An online tool has been proposed to compute both ECE and ECI. The following univariate calibration methods exist for transforming classifier scores into
class membership probabilities in the two-class case:
* Assignment value approach, see Garczarek (2002)
* Bayes approach, see Bennett (2002)
*
Isotonic regression, see Zadrozny and Elkan (2002)
*
Platt scaling (a form of
logistic regression
In statistics, a logistic model (or logit model) is a statistical model that models the logit, log-odds of an event as a linear function (calculus), linear combination of one or more independent variables. In regression analysis, logistic regres ...
), see Lewis and Gale (1994) and Platt (1999)
* Bayesian Binning into Quantiles (BBQ) calibration, see Naeini, Cooper, Hauskrecht (2015)
* Beta calibration, see Kull, Filho,
Flach (2017)
In probability prediction and forecasting
In
prediction
A prediction (Latin ''præ-'', "before," and ''dictum'', "something said") or forecast is a statement about a future event or about future data. Predictions are often, but not always, based upon experience or knowledge of forecasters. There ...
and
forecasting
Forecasting is the process of making predictions based on past and present data. Later these can be compared with what actually happens. For example, a company might Estimation, estimate their revenue in the next year, then compare it against the ...
, a
Brier score is sometimes used to assess prediction accuracy of a set of predictions, specifically that the magnitude of the assigned probabilities track the relative frequency of the observed outcomes.
Philip E. Tetlock employs the term "calibration" in this sense in his 2015 book ''
Superforecasting''.
[
] This differs from
accuracy and precision
Accuracy and precision are two measures of ''observational error''.
''Accuracy'' is how close a given set of measurements (observations or readings) are to their ''true value''.
''Precision'' is how close the measurements are to each other.
The ...
. For example, as expressed by
Daniel Kahneman, "if you give all events that happen a probability of .6 and all the events that don't happen a probability of .4, your discrimination is perfect but your calibration is miserable".
In
meteorology
Meteorology is the scientific study of the Earth's atmosphere and short-term atmospheric phenomena (i.e. weather), with a focus on weather forecasting. It has applications in the military, aviation, energy production, transport, agricultur ...
, in particular, as concerns
weather forecasting
Weather forecasting or weather prediction is the application of science and technology forecasting, to predict the conditions of the Earth's atmosphere, atmosphere for a given location and time. People have attempted to predict the weather info ...
, a related mode of assessment is known as
forecast skill.
In regression
The ''calibration problem'' in regression is the use of known data on the observed relationship between a dependent variable and an independent variable to make estimates of other values of the independent variable from new observations of the dependent variable. This can be known as "inverse regression"; there is also
sliced inverse regression. The following multivariate calibration methods exist for transforming classifier scores into
class membership probabilities in the case with classes count greater than two:
* Reduction to binary tasks and subsequent pairwise coupling, see Hastie and Tibshirani (1998)
[T. Hastie and R. Tibshirani,]
" Classification by pairwise coupling. In: M. I. Jordan, M. J. Kearns and Sara Solla, S. A. Solla (eds.), Advances in Neural Information Processing Systems, volume 10, Cambridge, MIT Press, 1998.
* Dirichlet calibration, see Gebel (2009)
Example
One example is that of dating objects, using observable evidence such as
tree
In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
rings for
dendrochronology
Dendrochronology (or tree-ring dating) is the scientific method of chronological dating, dating tree rings (also called growth rings) to the exact year they were formed in a tree. As well as dating them, this can give data for dendroclimatology, ...
or
carbon-14 for
radiometric dating
Radiometric dating, radioactive dating or radioisotope dating is a technique which is used to Chronological dating, date materials such as Rock (geology), rocks or carbon, in which trace radioactive impurity, impurities were selectively incorporat ...
. The observation is
caused by the age of the object being dated, rather than the reverse, and the aim is to use the method for estimating dates based on new observations. The
problem
Problem solving is the process of achieving a goal by overcoming obstacles, a frequent part of most activities. Problems in need of solutions range from simple personal tasks (e.g. how to turn on an appliance) to complex issues in business an ...
is whether the model used for relating known ages with observations should aim to minimise the error in the observation, or minimise the error in the date. The two approaches will produce different results, and the difference will increase if the model is then used for
extrapolation at some distance from the known results.
See also
*
*
*
Conformal prediction
References
{{DEFAULTSORT:Calibration (Statistics)
Regression analysis
*
Statistical classification