In
mathematics, a Euclidean distance matrix is an
matrix
Matrix most commonly refers to:
* ''The Matrix'' (franchise), an American media franchise
** '' The Matrix'', a 1999 science-fiction action film
** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchi ...
representing the spacing of a set of
points in
Euclidean space
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidean sp ...
.
For points
in -dimensional space , the elements of their Euclidean distance matrix are given by squares of distances between them.
That is
:
where
denotes the
Euclidean norm
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidea ...
on .
:
In the context of (not necessarily Euclidean)
distance matrices, the entries are usually defined directly as distances, not their squares.
However, in the Euclidean case, squares of distances are used to avoid computing square roots and to simplify relevant theorems and algorithms.
Euclidean distance matrices are closely related to
Gram matrices (matrices of
dot products
In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used sometimes for other symmetric bilinear forms, for example in a pseudo-Euclidean space. is an alge ...
, describing norms of vectors and angles between them).
The latter are easily analyzed using methods of
linear algebra
Linear algebra is the branch of mathematics concerning linear equations such as:
:a_1x_1+\cdots +a_nx_n=b,
linear maps such as:
:(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n,
and their representations in vector spaces and through matric ...
.
This allows to characterize Euclidean distance matrices and recover the points
that realize it.
A realization, if it exists, is unique up to
rigid transformation
In mathematics, a rigid transformation (also called Euclidean transformation or Euclidean isometry) is a geometric transformation of a Euclidean space that preserves the Euclidean distance between every pair of points.
The rigid transformatio ...
s, i.e.
distance-preserving transformations of Euclidean space (
rotations
Rotation, or spin, is the circular movement of an object around a '' central axis''. A two-dimensional rotating object has only one possible central axis and can rotate in either a clockwise or counterclockwise direction. A three-dimensional ...
,
reflections,
translations
Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
).
In practical applications, distances are noisy measurements or come from arbitrary
dissimilarity estimates (not necessarily
metric
Metric or metrical may refer to:
* Metric system, an internationally adopted decimal system of measurement
* An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement
Mathematics
In mathem ...
).
The goal may be to visualize such data by points in Euclidean space whose distance matrix approximates a given dissimilarity matrix as well as possible — this is known as
multidimensional scaling
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of n objects or individuals" into a configurati ...
.
Alternatively, given two sets of data already represented by points in Euclidean space, one may ask how similar they are in shape, that is, how closely can they be related by a
distance-preserving transformation — this is
Procrustes analysis
In statistics, Procrustes analysis is a form of statistical shape analysis used to analyse the distribution of a set of shapes. The name ''Procrustes'' ( el, Προκρούστης) refers to a bandit from Greek mythology who made his victims fit ...
.
Some of the distances may also be missing or come unlabelled (as an unordered set or multiset instead of a matrix), leading to more complex algorithmic tasks, such as the graph realization problem or the turnpike problem (for points on a line).
Properties
By the fact that Euclidean distance is a
metric
Metric or metrical may refer to:
* Metric system, an internationally adopted decimal system of measurement
* An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement
Mathematics
In mathem ...
, the matrix has the following properties.
* All elements on the
diagonal
In geometry, a diagonal is a line segment joining two vertices of a polygon or polyhedron, when those vertices are not on the same edge. Informally, any sloping line is called diagonal. The word ''diagonal'' derives from the ancient Gree ...
of are zero (i.e. it is a
hollow matrix); hence the
trace of is zero.
* is
symmetric
Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definit ...
(i.e.
).
*
(by the
triangle inequality
In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side.
This statement permits the inclusion of degenerate triangles, bu ...
)
*
In dimension , a Euclidean distance matrix has
rank
Rank is the relative position, value, worth, complexity, power, importance, authority, level, etc. of a person or object within a ranking, such as:
Level or position in a hierarchical organization
* Academic rank
* Diplomatic rank
* Hierarchy
* H ...
less than or equal to . If the points
are in
general position
In algebraic geometry and computational geometry, general position is a notion of genericity for a set of points, or other geometric objects. It means the ''general case'' situation, as opposed to some more special or coincidental cases that are ...
, the rank is exactly
Distances can be shrunk by any power to obtain another Euclidean distance matrix. That is, if
is a Euclidean distance matrix, then
is a Euclidean distance matrix for every .
Relation to Gram matrix
The
Gram matrix of a sequence of points
in -dimensional space
is the matrix
of their
dot product
In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used sometimes for other symmetric bilinear forms, for example in a pseudo-Euclidean space. is an alg ...
s (here a point
is thought of as a vector from 0 to that point):
:
, where
is the angle between the vector
and
.
In particular
:
is the square of the distance of
from 0.
Thus the Gram matrix describes norms and angles of vectors (from 0 to)
.
Let
be the matrix containing
as columns.
Then
:
, because
(seeing
as a column vector).
Matrices that can be decomposed as
, that is, Gram matrices of some sequence of vectors (columns of
), are well understood — these are precisely
positive semidefinite matrices.
To relate the Euclidean distance matrix to the Gram matrix, observe that
:
That is, the norms and angles determine the distances.
Note that the Gram matrix contains additional information: distances from 0.
Conversely, distances
between pairs of points
determine dot products between vectors
():
:
(this is known as the
polarization identity
In linear algebra, a branch of mathematics, the polarization identity is any one of a family of formulas that express the inner product of two vectors in terms of the norm of a normed vector space.
If a norm arises from an inner product the ...
).
Characterizations
For a matrix , a sequence of points
in -dimensional Euclidean space
is called a realization of in if is their Euclidean distance matrix.
One can assume without loss of generality that
(because
translating
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
by
preserves distances).
This follows from the previous discussion because is positive semidefinite of rank at most if and only if it can be decomposed as
where is an matrix.
Moreover, the columns of give a realization in .
Therefore, any method to decompose allows to find a realization.
The two main approaches are variants of
Cholesky decomposition
In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for effi ...
or using
spectral decompositions to find the
principal square root of , see
Definite matrix#Decomposition.
The statement of theorem distinguishes the first point
. A more symmetric variant of the same theorem is the following:
Other characterizations involve
Cayley–Menger determinants.
In particular, these allow to show that a symmetric
hollow
Hollow may refer to:
Natural phenomena
*Hollow, a low, wooded area, such as a copse
* Hollow (landform), a small vee-shaped, riverine type of valley
*Tree hollow, a void in a branch or trunk, which may provide habitat for animals
Places
*Sleepy ...
matrix is realizable in if and only if every
principal submatrix
In mathematics, a matrix (plural matrices) is a rectangular array or table of numbers, symbols, or expressions, arranged in rows and columns, which is used to represent a mathematical object or a property of such an object.
For example,
\ ...
is.
In other words, a
semimetric on finitely many points is
embedabble isometrically in if and only if every points are.
In practice, the definiteness or rank conditions may fail due to numerical errors, noise in measurements, or due to the data not coming from actual Euclidean distances.
Points that realize optimally similar distances can then be found by semidefinite approximation (and low rank approximation, if desired) using linear algebraic tools such as
singular value decomposition
In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any \ m \times n\ matrix. It is r ...
or
semidefinite programming Semidefinite programming (SDP) is a subfield of convex optimization concerned with the optimization of a linear objective function (a user-specified function that the user wants to minimize or maximize)
over the intersection of the cone of positiv ...
.
This is known as
multidimensional scaling
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of n objects or individuals" into a configurati ...
.
Variants of these methods can also deal with incomplete distance data.
Unlabeled data, that is, a set or multiset of distances not assigned to particular pairs, is much more difficult to deal with.
Such data arises, for example, in
DNA sequencing (specifically, genome recovery from
partial digest) or
phase retrieval
Phase retrieval is the process of algorithmically finding solutions to the phase problem. Given a complex signal F(k), of amplitude , F (k), , and phase \psi(k):
::F(k) = , F(k), e^ =\int_^ f(x)\ e^\,dx
where ''x'' is an ''M''-dimensional spatia ...
.
Two sets of points are called
homometric if they have the same multiset of distances (but are not necessarily related by a rigid transformation).
Deciding whether a given multiset of distances can be realized in a given dimension is
strongly NP-hard.
In one dimension this is known as the turnpike problem; it is an open question whether it can be solved in polynomial time.
When the multiset of distances is given with error bars, even the one dimensional case is
NP-hard
In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...
.
Nevertheless, practical algorithms exist for many cases, e.g. random points.
Uniqueness of representations
Given a Euclidean distance matrix, the sequence of points that realize it is unique up to
rigid transformation
In mathematics, a rigid transformation (also called Euclidean transformation or Euclidean isometry) is a geometric transformation of a Euclidean space that preserves the Euclidean distance between every pair of points.
The rigid transformatio ...
s – these are
isometries of Euclidean space:
rotations
Rotation, or spin, is the circular movement of an object around a '' central axis''. A two-dimensional rotating object has only one possible central axis and can rotate in either a clockwise or counterclockwise direction. A three-dimensional ...
,
reflections,
translations
Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
, and their compositions.
Rigid transformations preserve distances so one direction is clear.
Suppose the distances
and
are equal.
Without loss of generality we can assume
by translating the points by
and
, respectively.
Then the Gram matrix of remaining vectors
is identical to the Gram matrix of vectors
().
That is,
, where and are the matrices containing the respective vectors as columns.
This implies there exists an
orthogonal
In mathematics, orthogonality is the generalization of the geometric notion of '' perpendicularity''.
By extension, orthogonality is also used to refer to the separation of specific features of a system. The term also has specialized meanings in ...
matrix such that , see
Definite symmetric matrix#Uniqueness up to unitary transformations.
describes an
orthogonal transformation In linear algebra, an orthogonal transformation is a linear transformation ''T'' : ''V'' → ''V'' on a real inner product space ''V'', that preserves the inner product. That is, for each pair of elements of ''V'', we have
...
of (a composition of rotations and reflections, without translations) which maps
to
(and 0 to 0).
The final rigid transformation is described by
.
In applications, when distances don't match exactly,
Procrustes analysis
In statistics, Procrustes analysis is a form of statistical shape analysis used to analyse the distribution of a set of shapes. The name ''Procrustes'' ( el, Προκρούστης) refers to a bandit from Greek mythology who made his victims fit ...
aims to relate two point sets as close as possible via rigid transformations, usually using
singular value decomposition
In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any \ m \times n\ matrix. It is r ...
.
The ordinary Euclidean case is known as the
orthogonal Procrustes problem or
Wahba's problem (when observations are weighted to account for varying uncertainties).
Examples of applications include determining orientations of satellites, comparing molecule structure (in
cheminformatics
Cheminformatics (also known as chemoinformatics) refers to use of physical chemistry theory with computer and information science techniques—so called "''in silico''" techniques—in application to a range of descriptive and prescriptive proble ...
), protein structure (
structural alignment
Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RN ...
in
bioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
), or bone structure (
statistical shape analysis Statistical shape analysis is an analysis of the geometrical properties of some given set of shapes by statistical methods. For instance, it could be used to quantify differences between male and female gorilla skull shapes, normal and pathological ...
in biology).
See also
*
Adjacency matrix
In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.
In the special case of a finite simple ...
*
Coplanarity
*
Distance geometry Distance geometry is the branch of mathematics concerned with characterizing and studying sets of points based ''only'' on given values of the distances between pairs of points. More abstractly, it is the study of semimetric spaces and the isom ...
*
Hollow matrix
*
Distance matrix
In mathematics, computer science and especially graph theory, a distance matrix is a square matrix
In mathematics, a square matrix is a matrix with the same number of rows and columns. An ''n''-by-''n'' matrix is known as a square matrix of orde ...
*
Euclidean random matrix Within mathematics, an ''N''×''N'' Euclidean random matrix  is defined with the help of an arbitrary deterministic function ''f''(r, r′) and of ''N'' points randomly distributed in a region ''V'' of ''d''-dimensional Euclidean space. The eleme ...
* Classical
multidimensional scaling
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of n objects or individuals" into a configurati ...
, a visualization technique that approximates an arbitrary dissimilarity matrix by a Euclidean distance matrix
*
Cayley–Menger determinant
*
Semidefinite embedding
Notes
References
*
*
*
*
*
{{Matrix classes
Matrices
Distance