The Mahalanobis distance is a
measure of the distance between a point
and a
probability distribution
In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
, introduced by
P. C. Mahalanobis in 1936. The mathematical details of Mahalanobis distance first appeared in the ''Journal of The Asiatic Society of Bengal'' in 1936. Mahalanobis's definition was prompted by the problem of
identifying the similarities of skulls based on measurements (the earliest work related to similarities of skulls are from 1922 and another later work is from 1927).
R.C. Bose later obtained the sampling distribution of Mahalanobis distance, under the assumption of equal dispersion.
It is a multivariate generalization of the square of the
standard score
In statistics, the standard score or ''z''-score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores ...
: how many
standard deviations
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its mean. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the ...
away
is from the
mean
A mean is a quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means (or "measures of central tendency") in mathematics, especially in statist ...
of
. This distance is zero for
at the mean of
and grows as
moves away from the mean along each
principal component axis. If each of these axes is re-scaled to have unit variance, then the Mahalanobis distance corresponds to standard
Euclidean distance
In mathematics, the Euclidean distance between two points in Euclidean space is the length of the line segment between them. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, and therefore is o ...
in the transformed space. The Mahalanobis distance is thus
unitless,
scale-invariant, and takes into account the
correlations of the
data set
A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...
.
Definition
Given a probability distribution
on
, with mean
and positive semi-definite
covariance matrix
In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements of ...
, the Mahalanobis distance of a point
from
is
Given two points
and
in
, the Mahalanobis distance between them with respect to
is
which means that
.
Since
is
positive semi-definite, so is
, thus the square roots are always defined.
We can find useful decompositions of the squared Mahalanobis distance that help to explain some reasons for the outlyingness of multivariate observations and also provide a graphical tool for identifying outliers.
By the
spectral theorem
In linear algebra and functional analysis, a spectral theorem is a result about when a linear operator or matrix can be diagonalized (that is, represented as a diagonal matrix in some basis). This is extremely useful because computations involvin ...
,
can be decomposed as
for some real
matrix. One choice for
is the symmetric square root of
, which is the
standard deviation matrix.
This gives us the equivalent definition
where
is the Euclidean norm. That is, the Mahalanobis distance is the Euclidean distance after a
whitening transformation
A whitening transformation or sphering transformation is a linear transformation that transforms a vector of random variables with a known covariance matrix into a set of new variables whose covariance is the identity matrix, meaning that they ar ...
.
The existence of
is guaranteed by the spectral theorem, but it is not unique. Different choices have different theoretical and practical advantages.
In practice, the distribution
is usually the
sample distribution from a set of
IID samples from an underlying unknown distribution, so
is the sample mean, and
is the covariance matrix of the samples.
When the
affine span of the samples is not the entire
, the covariance matrix would not be positive-definite, which means the above definition would not work. However, in general, the Mahalanobis distance is preserved under any full-rank affine transformation of the affine span of the samples. So in case the affine span is not the entire
, the samples can be first orthogonally projected to
, where
is the dimension of the affine span of the samples, then the Mahalanobis distance can be computed as usual.
Intuitive explanation
Consider the problem of estimating the probability that a test point in ''N''-dimensional
Euclidean space
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces ...
belongs to a set, where we are given sample points that definitely belong to that set. Our first step would be to find the
centroid
In mathematics and physics, the centroid, also known as geometric center or center of figure, of a plane figure or solid figure is the arithmetic mean position of all the points in the figure. The same definition extends to any object in n-d ...
or center of mass of the sample points. Intuitively, the closer the point in question is to this center of mass, the more likely it is to belong to the set.
However, we also need to know if the set is spread out over a large range or a small range, so that we can decide whether a given distance from the center is noteworthy or not. The simplistic approach is to estimate the
standard deviation
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
of the distances of the sample points from the center of mass. If the distance between the test point and the center of mass is less than one standard deviation, then we might conclude that it is highly probable that the test point belongs to the set. The further away it is, the more likely that the test point should not be classified as belonging to the set.
This intuitive approach can be made quantitative by defining the normalized distance between the test point and the set to be
, which reads:
. By plugging this into the normal distribution, we can derive the probability of the test point belonging to the set.
The drawback of the above approach was that we assumed that the sample points are distributed about the center of mass in a spherical manner. Were the distribution to be decidedly non-spherical, for instance ellipsoidal, then we would expect the probability of the test point belonging to the set to depend not only on the distance from the center of mass, but also on the direction. In those directions where the ellipsoid has a short axis the test point must be closer, while in those where the axis is long the test point can be further away from the center.
Putting this on a mathematical basis, the ellipsoid that best represents the set's probability distribution can be estimated by building the covariance matrix of the samples. The Mahalanobis distance is the distance of the test point from the center of mass divided by the width of the ellipsoid in the direction of the test point.
Normal distributions
For a
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
in any number of dimensions, the probability density of an observation
is uniquely determined by the Mahalanobis distance
:
:
Specifically,
follows the
chi-squared distribution
In probability theory and statistics, the \chi^2-distribution with k Degrees of freedom (statistics), degrees of freedom is the distribution of a sum of the squares of k Independence (probability theory), independent standard normal random vari ...
with
degrees of freedom, where
is the number of dimensions of the normal distribution. If the number of dimensions is 2, for example, the probability of a particular calculated
being less than some threshold
is
. To determine a threshold to achieve a particular probability,
, use
, for 2 dimensions. For number of dimensions other than 2, the cumulative chi-squared distribution should be consulted.
In a normal distribution, the region where the Mahalanobis distance is less than one (i.e. the region inside the ellipsoid at distance one) is exactly the region where the probability distribution is
concave
Concave or concavity may refer to:
Science and technology
* Concave lens
* Concave mirror
Mathematics
* Concave function, the negative of a convex function
* Concave polygon
A simple polygon that is not convex is called concave, non-convex or ...
.
The Mahalanobis distance is proportional, for a normal distribution, to the square root of the negative
log-likelihood
A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the j ...
(after adding a constant so the minimum is at zero).
Other forms of multivariate location and scatter

The sample mean and covariance matrix can be quite sensitive to outliers, therefore other approaches for calculating the multivariate location and scatter of data are also commonly used when calculating the Mahalanobis distance. The Minimum Covariance Determinant approach estimates multivariate location and scatter from a subset numbering
data points that has the smallest variance-covariance matrix determinant. The Minimum Volume Ellipsoid approach is similar to the Minimum Covariance Determinant approach in that it works with a subset of size
data points, but the Minimum Volume Ellipsoid estimates multivariate location and scatter from the ellipsoid of minimal volume that encapsulates the
data points. Each method varies in its definition of the distribution of the data, and therefore produces different Mahalanobis distances. The Minimum Covariance Determinant and Minimum Volume Ellipsoid approaches are more robust to samples that contain outliers, while the sample mean and covariance matrix tends to be more reliable with small and biased data sets.
Relationship to normal random variables
In general, given a normal (
Gaussian) random variable
with variance
and mean
, any other normal random variable
(with mean
and variance
) can be defined in terms of
by the equation
Conversely, to recover a normalized random variable from any normal random variable, one can typically solve for
. If we square both sides, and take the square-root, we will get an equation for a metric that looks a lot like the Mahalanobis distance:
The resulting magnitude is always non-negative and varies with the distance of the data from the mean, attributes that are convenient when trying to define a model for the data.
Relationship to leverage
Mahalanobis distance is closely related to the
leverage statistic,
, but has a different scale:
Applications
Mahalanobis distance is widely used in
cluster analysis
Cluster analysis or clustering is the data analyzing technique in which task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more Similarity measure, similar (in some specific sense defined by the ...
and
classification
Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...
techniques. It is closely related to
Hotelling's T-square distribution used for multivariate statistical testing and Fisher's
linear discriminant analysis
Linear discriminant analysis (LDA), normal discriminant analysis (NDA), canonical variates analysis (CVA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to fi ...
that is used for
supervised classification.
In order to use the Mahalanobis distance to classify a test point as belonging to one of ''N'' classes, one first
estimates the covariance matrix of each class, usually based on samples known to belong to each class. Then, given a test sample, one computes the Mahalanobis distance to each class, and classifies the test point as belonging to that class for which the Mahalanobis distance is minimal.
Mahalanobis distance and leverage are often used to detect
outlier
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s, especially in the development of
linear regression
In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A mode ...
models. A point that has a greater Mahalanobis distance from the rest of the sample population of points is said to have higher leverage since it has a greater influence on the slope or coefficients of the regression equation. Mahalanobis distance is also used to determine multivariate outliers. Regression techniques can be used to determine if a specific case within a sample population is an outlier via the combination of two or more variable scores. Even for normal distributions, a point can be a multivariate outlier even if it is not a univariate outlier for any variable (consider a probability density concentrated along the line
, for example), making Mahalanobis distance a more sensitive measure than checking dimensions individually.
Mahalanobis distance has also been used in
ecological niche modelling, as the convex elliptical shape of the distances relates well to the concept of the
fundamental niche.
Another example of usage is in finance, where Mahalanobis distance has been used to compute an indicator called the "turbulence index", which is a statistical measure of financial markets abnormal behaviour. An implementation as a Web API of this indicator is available online.
Software implementations
Many programming languages and statistical packages, such as
R,
Python, etc., include implementations of Mahalanobis distance.
See also
*
Bregman divergence
In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function; they form an important class of divergences. ...
(the Mahalanobis distance is an example of a Bregman divergence)
*
Bhattacharyya distance related, for measuring similarity between data sets (and not between a point and a data set)
*
Hamming distance
In information theory, the Hamming distance between two String (computer science), strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number ...
identifies the difference bit by bit of two strings
*
Hellinger distance
In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of ''f''-divergence. The Hell ...
, also a measure of distance between data sets
*
Similarity learning, for other approaches to learn a distance metric from examples.
References
External links
*
Mahalanobis distance tutorial– interactive online program and spreadsheet computation
– overview of Mahalanobis distance, including MATLAB code
What is Mahalanobis distance?– intuitive, illustrated explanation, from Rick Wicklin on blogs.sas.com
{{DEFAULTSORT:Mahalanobis Distance
Statistical distance
Multivariate statistics
Distance
Indian inventions