mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...

, a Euclidean distance matrix is an

matrix Matrix (: matrices or matrixes) or MATRIX may refer to: Science and mathematics * Matrix (mathematics), a rectangular array of numbers, symbols or expressions * Matrix (logic), part of a formula in prenex normal form * Matrix (biology), the m ...

representing the spacing of a set of

points A point is a small dot or the sharp tip of something. Point or points may refer to: Mathematics * Point (geometry), an entity that has a location in space or on a plane, but has no extent; more generally, an element of some abstract topologica ...

Euclidean space Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces ...

. For points

x_1,x_2,\ldots,x_n

in -dimensional space , the elements of their Euclidean distance matrix are given by squares of distances between them. That is :

\begin
A & = (a_); \\
a_ & = d_^2 \;=\; \lVert x_i - x_j\rVert^2
\end

where

\, \cdot\,

denotes the

Euclidean norm Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces'' ...

on . :

A = \begin
0 & d_^2 & d_^2 & \dots & d_^2 \\
d_^2 & 0 & d_^2 & \dots & d_^2 \\
d_^2 & d_^2 & 0 & \dots & d_^2 \\
\vdots&\vdots & \vdots & \ddots&\vdots&  \\
d_^2 & d_^2 & d_^2 & \dots & 0 \\
\end

In the context of (not necessarily Euclidean) distance matrices, the entries are usually defined directly as distances, not their squares. However, in the Euclidean case, squares of distances are used to avoid computing square roots and to simplify relevant theorems and algorithms. Euclidean distance matrices are closely related to Gram matrices (matrices of dot products, describing norms of vectors and angles between them). The latter are easily analyzed using methods of

linear algebra Linear algebra is the branch of mathematics concerning linear equations such as :a_1x_1+\cdots +a_nx_n=b, linear maps such as :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrix (mathemat ...

. This allows to characterize Euclidean distance matrices and recover the points

x_1,x_2,\ldots,x_n

that realize it. A realization, if it exists, is unique up to

rigid transformation In mathematics, a rigid transformation (also called Euclidean transformation or Euclidean isometry) is a geometric transformation of a Euclidean space that preserves the Euclidean distance between every pair of points. The rigid transformation ...

s, i.e. distance-preserving transformations of Euclidean space ( rotations, reflections,

translations Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transl ...

). In practical applications, distances are noisy measurements or come from arbitrary dissimilarity estimates (not necessarily

metric Metric or metrical may refer to: Measuring * Metric system, an internationally adopted decimal system of measurement * An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement Mathematics ...

). The goal may be to visualize such data by points in Euclidean space whose distance matrix approximates a given dissimilarity matrix as well as possible — this is known as

multidimensional scaling Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a data set. MDS is used to translate distances between each pair of n objects in a set into a configuration of n points mapped into an ...

. Alternatively, given two sets of data already represented by points in Euclidean space, one may ask how similar they are in shape, that is, how closely can they be related by a distance-preserving transformation — this is Procrustes analysis. Some of the distances may also be missing or come unlabelled (as an unordered set or multiset instead of a matrix), leading to more complex algorithmic tasks, such as the graph realization problem or the turnpike problem (for points on a line).

Properties

By the fact that Euclidean distance is a

, the matrix has the following properties. * All elements on the

diagonal In geometry, a diagonal is a line segment joining two vertices of a polygon or polyhedron, when those vertices are not on the same edge. Informally, any sloping line is called diagonal. The word ''diagonal'' derives from the ancient Greek � ...

of are zero (i.e. it is a

hollow matrix In mathematics, a hollow matrix may refer to one of several related classes of matrix: a sparse matrix; a matrix with a large block of zeroes; or a matrix with diagonal entries all zero. Definitions Sparse A ''hollow matrix'' may be one with "f ...

); hence the

trace Trace may refer to: Arts and entertainment Music * ''Trace'' (Son Volt album), 1995 * ''Trace'' (Died Pretty album), 1993 * Trace (band), a Dutch progressive rock band * ''The Trace'' (album), by Nell Other uses in arts and entertainment * ...

of is zero. * is

symmetric Symmetry () in everyday life refers to a sense of harmonious and beautiful proportion and balance. In mathematics, the term has a more precise definition and is usually used to refer to an object that is invariant under some transformations ...

(i.e.

a_ = a_

). *

\sqrt \le \sqrt + \sqrt

(by the

triangle inequality In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side. This statement permits the inclusion of Degeneracy (mathematics)#T ...

) *

a_\ge 0

In dimension , a Euclidean distance matrix has

rank A rank is a position in a hierarchy. It can be formally recognized—for example, cardinal, chief executive officer, general, professor—or unofficial. People Formal ranks * Academic rank * Corporate title * Diplomatic rank * Hierarchy ...

less than or equal to . If the points

x_1,x_2,\ldots,x_n

are in

general position In algebraic geometry and computational geometry, general position is a notion of genericity for a set of points, or other geometric objects. It means the ''general case'' situation, as opposed to some more special or coincidental cases that a ...

, the rank is exactly Distances can be shrunk by any power to obtain another Euclidean distance matrix. That is, if

A=(a_)

is a Euclidean distance matrix, then

(^s)

is a Euclidean distance matrix for every .

Relation to Gram matrix

The

Gram matrix In linear algebra, the Gram matrix (or Gramian matrix, Gramian) of a set of vectors v_1,\dots, v_n in an inner product space is the Hermitian matrix of inner products, whose entries are given by the inner product G_ = \left\langle v_i, v_j \right\r ...

of a sequence of points

x_1,x_2,\ldots,x_n

in -dimensional space is the matrix

G = (g_)

of their

dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a Scalar (mathematics), scalar as a result". It is also used for other symmetric bilinear forms, for example in a pseudo-Euclidean space. N ...

s (here a point

x_i

is thought of as a vector from 0 to that point): :

g_ = x_i \cdot x_j = \, x_i\,  \, x_j\,  \cos \theta

, where

\theta

is the angle between the vector

x_i

and

x_j

. In particular :

g_ = \, x_i\, ^2

is the square of the distance of

x_i

from 0. Thus the Gram matrix describes norms and angles of vectors (from 0 to)

x_1,x_2,\ldots,x_n

. Let

X

be the matrix containing

x_1,x_2,\ldots,x_n

as columns. Then :

G = X^\textsf X

, because

g_ = x_i^\textsf x_j

(seeing

x_i

as a column vector). Matrices that can be decomposed as

X^\textsfX

, that is, Gram matrices of some sequence of vectors (columns of

X

), are well understood — these are precisely positive semidefinite matrices. To relate the Euclidean distance matrix to the Gram matrix, observe that :

d_^2 = \, x_i - x_j\, ^2 = (x_i - x_j)^\textsf (x_i - x_j) = x_i^\textsf x_i - 2x_i^\textsf x_j + x_j^\textsf x_j = g_ -2g_ + g_

That is, the norms and angles determine the distances. Note that the Gram matrix contains additional information: distances from 0. Conversely, distances

d_

between pairs of points

x_0,x_1,\ldots,x_n

determine dot products between vectors

x_i-x_0

(): :

g_ = (x_i-x_0) \cdot (x_j-x_0) = \frac\left(\, x_i-x_0\, ^2 + \, x_j-x_0\, ^2 - \, x_i - x_j\, ^2 \right) = \frac(d_^2 + d_^2 - d_^2)

(this is known as the

polarization identity In linear algebra, a branch of mathematics, the polarization identity is any one of a family of formulas that express the inner product of two vectors in terms of the norm of a normed vector space. If a norm arises from an inner product t ...

Characterizations

For a matrix , a sequence of points

x_1,x_2,\ldots,x_n

in -dimensional Euclidean space is called a realization of in if is their Euclidean distance matrix. One can assume without loss of generality that

x_1 = \mathbf

(because translating by

-x_1

preserves distances). This follows from the previous discussion because is positive semidefinite of rank at most if and only if it can be decomposed as

G = X^\textsf X

where is a matrix. Moreover, the columns of give a realization in . Therefore, any method to decompose allows to find a realization. The two main approaches are variants of

Cholesky decomposition In linear algebra, the Cholesky decomposition or Cholesky factorization (pronounced ) is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for eff ...

or using spectral decompositions to find the principal square root of , see Definite matrix#Decomposition. The statement of theorem distinguishes the first point

x_1

. A more symmetric variant of the same theorem is the following: Other characterizations involve

Cayley–Menger determinant In linear algebra, geometry, and trigonometry, the Cayley–Menger determinant is a formula for the content, i.e. the higher-dimensional volume, of a n-dimensional simplex in terms of the squares of all of the distances between pairs of its ...

s. In particular, these allow to show that a symmetric hollow matrix is realizable in if and only if every principal submatrix is. In other words, a semimetric on finitely many points is embedabble isometrically in if and only if every points are. In practice, the definiteness or rank conditions may fail due to numerical errors, noise in measurements, or due to the data not coming from actual Euclidean distances. Points that realize optimally similar distances can then be found by semidefinite approximation (and low rank approximation, if desired) using linear algebraic tools such as

singular value decomposition In linear algebra, the singular value decomposition (SVD) is a Matrix decomposition, factorization of a real number, real or complex number, complex matrix (mathematics), matrix into a rotation, followed by a rescaling followed by another rota ...

semidefinite programming Semidefinite programming (SDP) is a subfield of mathematical programming concerned with the optimization of a linear objective function (a user-specified function that the user wants to minimize or maximize) over the intersection of the cone of po ...

. This is known as

. Variants of these methods can also deal with incomplete distance data. Unlabeled data, that is, a set or multiset of distances not assigned to particular pairs, is much more difficult to deal with. Such data arises, for example, in

DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...

(specifically, genome recovery from partial digest) or

phase retrieval Phase retrieval is the process of algorithmically finding solutions to the phase problem. Given a complex spectrum F(k), of amplitude , F (k), , and phase \psi(k): ::F(k) = , F(k), e^ =\int_^ f(x)\ e^\,dx where ''x'' is an ''M''-dimensional spat ...

. Two sets of points are called homometric if they have the same multiset of distances (but are not necessarily related by a rigid transformation). Deciding whether a given multiset of distances can be realized in a given dimension is strongly NP-hard. In one dimension this is known as the turnpike problem; it is an open question whether it can be solved in polynomial time. When the multiset of distances is given with error bars, even the one dimensional case is

NP-hard In computational complexity theory, a computational problem ''H'' is called NP-hard if, for every problem ''L'' which can be solved in non-deterministic polynomial-time, there is a polynomial-time reduction from ''L'' to ''H''. That is, assumi ...

. Nevertheless, practical algorithms exist for many cases, e.g. random points.

Uniqueness of representations

Given a Euclidean distance matrix, the sequence of points that realize it is unique up to

s – these are

isometries In mathematics, an isometry (or congruence, or congruent transformation) is a distance-preserving transformation between metric spaces, usually assumed to be bijective. The word isometry is derived from the Ancient Greek: ἴσος ''isos'' mea ...

of Euclidean space: rotations, reflections,

, and their compositions. Rigid transformations preserve distances so one direction is clear. Suppose the distances

\, x_i-x_j\,

and

\, y_i-y_j\,

are equal. Without loss of generality we can assume

x_1=y_1=\textbf

by translating the points by

-x_1

and

-y_1

, respectively. Then the Gram matrix of remaining vectors

x_i=x_i-x_1

is identical to the Gram matrix of vectors

y_i

(). That is,

X^\textsf X = Y^\textsf Y

, where and are the matrices containing the respective vectors as columns. This implies there exists an

orthogonal In mathematics, orthogonality (mathematics), orthogonality is the generalization of the geometric notion of ''perpendicularity''. Although many authors use the two terms ''perpendicular'' and ''orthogonal'' interchangeably, the term ''perpendic ...

matrix such that , see Definite symmetric matrix#Uniqueness up to unitary transformations. describes an

orthogonal transformation In linear algebra, an orthogonal transformation is a linear transformation ''T'' : ''V'' → ''V'' on a real inner product space ''V'', that preserves the inner product. That is, for each pair of elements of ''V'', we hav ...

of (a composition of rotations and reflections, without translations) which maps

x_i

y_i

(and 0 to 0). The final rigid transformation is described by

T(x) = Q(x-x_1)+y_1

. In applications, when distances don't match exactly, Procrustes analysis aims to relate two point sets as close as possible via rigid transformations, usually using

. The ordinary Euclidean case is known as the

orthogonal Procrustes problem The orthogonal Procrustes problem is a matrix approximation problem in linear algebra. In its classical form, one is given two matrices A and B and asked to find an orthogonal matrix \Omega which most closely maps A to B. Specifically, the orthog ...

or Wahba's problem (when observations are weighted to account for varying uncertainties). Examples of applications include determining orientations of satellites, comparing molecule structure (in

cheminformatics Cheminformatics (also known as chemoinformatics) refers to the use of physical chemistry theory with computer and information science techniques—so called "'' in silico''" techniques—in application to a range of descriptive and prescriptive ...

), protein structure ( structural alignment in

bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...

), or bone structure ( statistical shape analysis in biology).

Notes

References