
In
geometry
Geometry (; ) is a branch of mathematics concerned with properties of space such as the distance, shape, size, and relative position of figures. Geometry is, along with arithmetic, one of the oldest branches of mathematics. A mathematician w ...
, the geometric median of a
discrete point set in a
Euclidean space
Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are ''Euclidean spaces ...
is the point minimizing the sum of distances to the sample points. This generalizes the
median
The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...
, which has the property of minimizing the sum of distances or absolute differences for one-dimensional data. It is also known as the spatial median,
Euclidean minisum point,
Torricelli point, or 1-median. It provides a measure of
central tendency in higher dimensions and it is a standard problem in
facility location, i.e., locating a facility to minimize the cost of transportation.
The geometric median is an important
estimator
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on Sample (statistics), observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguish ...
of
location in statistics, because it minimizes the sum of the
''L''2 distances of the samples. It is to be compared to the mean, which minimizes the sum of the ''squared'' ''L''
2 distances; and to the coordinate-wise median which minimizes the sum of the
''L''1 distances.
The more general
''k''-median problem asks for the location of ''k'' cluster centers minimizing the sum of ''L''
2 distances from each sample point to its nearest center.
The special case of the problem for three points in the plane (that is, = 3 and = 2 in the definition below) is sometimes also known as ''Fermat's problem''; it arises in the construction of minimal
Steiner trees, and was originally posed as a problem by
Pierre de Fermat
Pierre de Fermat (; ; 17 August 1601 – 12 January 1665) was a French mathematician who is given credit for early developments that led to infinitesimal calculus, including his technique of adequality. In particular, he is recognized for his d ...
and solved by
Evangelista Torricelli
Evangelista Torricelli ( ; ; 15 October 160825 October 1647) was an Italian people, Italian physicist and mathematician, and a student of Benedetto Castelli. He is best known for his invention of the barometer, but is also known for his advances i ...
. Its solution is now known as the ''
Fermat point'' of the triangle formed by the three sample points. The geometric median may in turn be generalized to the problem of minimizing the sum of ''weighted'' distances, known as the ''
Weber problem'' after
Alfred Weber's discussion of the problem in his 1909 book on facility location.
Some sources instead call Weber's problem the ''Fermat–Weber problem'', but others use this name for the unweighted geometric median problem.
provides a survey of the geometric median problem. See for generalizations of the problem to non-discrete point sets.
Definition
Formally, for a given set of ''m'' points
with each
, the geometric median is defined as the sum of the ''L''
2 distances minimizer
:
Here,
arg min means the value of the argument
which minimizes the sum. In this case, it is the point
in ''n''-dimensional Euclidean space from where the sum of all
Euclidean distance
In mathematics, the Euclidean distance between two points in Euclidean space is the length of the line segment between them. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, and therefore is o ...
s to the
's is minimum.
Properties
* For the 1-dimensional case, the geometric median coincides with the
median
The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...
. This is because the
univariate
In mathematics, a univariate object is an expression (mathematics), expression, equation, function (mathematics), function or polynomial involving only one Variable (mathematics), variable. Objects involving more than one variable are ''wikt:multi ...
median also minimizes the sum of distances from the points. (More precisely, if the points are ''p
1'', ..., ''p
n'', in that order, the geometric median is the middle point
if ''n'' is odd, but is not uniquely determined if ''n'' is even, when it can be any point in the line segment between the two middling points
and
.)
* The geometric median is unique whenever the points are not
collinear
In geometry, collinearity of a set of Point (geometry), points is the property of their lying on a single Line (geometry), line. A set of points with this property is said to be collinear (sometimes spelled as colinear). In greater generality, t ...
.
* The geometric median is
equivariant for Euclidean
similarity transformations, including
translation
Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
and
rotation
Rotation or rotational/rotary motion is the circular movement of an object around a central line, known as an ''axis of rotation''. A plane figure can rotate in either a clockwise or counterclockwise sense around a perpendicular axis intersect ...
.
This means that one would get the same result either by transforming the geometric median, or by applying the same transformation to the sample data and finding the geometric median of the transformed data. This property follows from the fact that the geometric median is defined only from pairwise distances, and does not depend on the system of orthogonal
Cartesian coordinates
In geometry, a Cartesian coordinate system (, ) in a plane is a coordinate system that specifies each point uniquely by a pair of real numbers called ''coordinates'', which are the signed distances to the point from two fixed perpendicular o ...
by which the sample data is represented. In contrast, the component-wise median for a multivariate data set is not in general rotation invariant, nor is it independent of the choice of coordinates.
* The geometric median has a
breakdown point of 0.5.
That is, up to half of the sample data may be arbitrarily corrupted, and the median of the samples will still provide a
robust estimator for the location of the uncorrupted data.
Special cases
*For 3 (non-
collinear
In geometry, collinearity of a set of Point (geometry), points is the property of their lying on a single Line (geometry), line. A set of points with this property is said to be collinear (sometimes spelled as colinear). In greater generality, t ...
) points, if any angle of the triangle formed by those points is 120° or more, then the geometric median is the point at the vertex of that angle. If all the angles are less than 120°, the geometric median is the point inside the triangle which subtends an angle of 120° to each three pairs of triangle vertices.
This is also known as the
Fermat point of the triangle formed by the three vertices. (If the three points are collinear then the geometric median is the point between the two other points, as is the case with a one-dimensional median.)
*For 4
coplanar points, if one of the four points is inside the triangle formed by the other three points, then the geometric median is that point. Otherwise, the four points form a convex
quadrilateral
In Euclidean geometry, geometry a quadrilateral is a four-sided polygon, having four Edge (geometry), edges (sides) and four Vertex (geometry), corners (vertices). The word is derived from the Latin words ''quadri'', a variant of four, and ''l ...
and the geometric median is the crossing point of the diagonals of the quadrilateral. The geometric median of four coplanar points is the same as the unique
Radon point of the four points.
Computation
Despite the geometric median's being an easy-to-understand concept, computing it poses a challenge. The
centroid
In mathematics and physics, the centroid, also known as geometric center or center of figure, of a plane figure or solid figure is the arithmetic mean position of all the points in the figure. The same definition extends to any object in n-d ...
or
center of mass
In physics, the center of mass of a distribution of mass in space (sometimes referred to as the barycenter or balance point) is the unique point at any given time where the weight function, weighted relative position (vector), position of the d ...
, defined similarly to the geometric median as minimizing the sum of the ''squares'' of the distances to each point, can be found by a simple formula — its coordinates are the averages of the coordinates of the points — but it has been shown that no
explicit formula, nor an exact algorithm involving only arithmetic operations and ''k''th roots, can exist in general for the geometric median. Therefore, only numerical or symbolic approximations to the solution of this problem are possible under this
model of computation
In computer science, and more specifically in computability theory and computational complexity theory, a model of computation is a model which describes how an output of a mathematical function is computed given an input. A model describes how ...
.
However, it is straightforward to calculate an approximation to the geometric median using an iterative procedure in which each step produces a more accurate approximation. Procedures of this type can be derived from the fact that the sum of distances to the sample points is a
convex function
In mathematics, a real-valued function is called convex if the line segment between any two distinct points on the graph of a function, graph of the function lies above or on the graph between the two points. Equivalently, a function is conve ...
, since the distance to each sample point is convex and the sum of convex functions remains convex. Therefore, procedures that decrease the sum of distances at each step cannot get trapped in a
local optimum.
One common approach of this type, called Weiszfeld's algorithm after the work of
Endre Weiszfeld, is a form of
iteratively re-weighted least squares. This algorithm defines a set of weights that are inversely proportional to the distances from the current estimate to the sample points, and creates a new estimate that is the weighted average of the sample according to these weights. That is,
:
This method converges for almost all initial positions, but may fail to converge when one of its estimates falls on one of the given points. It can be modified to handle these cases so that it converges for all initial points.
describe more sophisticated geometric optimization procedures for finding approximately optimal solutions to this problem.
show how to compute the geometric median to arbitrary precision in nearly
linear time
In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations ...
.
Note also that the problem can be formulated as the
second-order cone program
:
which can be solved in polynomial time using
common optimization solvers.
Characterization of the geometric median
If ''y'' is distinct from all the given points, ''x''
''i'', then ''y'' is the geometric median if and only if it satisfies:
:
This is equivalent to:
:
which is closely related to Weiszfeld's algorithm.
In general, ''y'' is the geometric median if and only if there are vectors ''u''
''i'' such that:
:
where for ''x''
''i'' ≠ ''y'',
:
and for ''x''
''i'' = ''y'',
:
An equivalent formulation of this condition is
:
It can be seen as a generalization of the median property, in the sense that any partition of the points, in particular as induced by any hyperplane through ''y'', has the same and opposite sum of positive ''directions'' from ''y'' on each side. In the one dimensional case, the hyperplane is the point ''y'' itself, and the sum of directions simplifies to the (directed) counting measure.
Generalizations
The geometric median can be generalized from Euclidean spaces to general
Riemannian manifold
In differential geometry, a Riemannian manifold is a geometric space on which many geometric notions such as distance, angles, length, volume, and curvature are defined. Euclidean space, the N-sphere, n-sphere, hyperbolic space, and smooth surf ...
s (and even
metric space
In mathematics, a metric space is a Set (mathematics), set together with a notion of ''distance'' between its Element (mathematics), elements, usually called point (geometry), points. The distance is measured by a function (mathematics), functi ...
s) using the same idea which is used to define the
Fréchet mean on a Riemannian manifold.
Let
be a Riemannian manifold with corresponding distance function
, let
be
weights summing to 1, and let
be
observations from
. Then we define the weighted geometric median
(or weighted Fréchet median) of the data points as
:
.
If all the weights are equal, we say simply that
is the geometric median.
See also
*
Medoid
*
Geometric median absolute deviation
Notes
References
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*.
*
*
*
*
* Translated into English as
{{refend
Means
Multivariate statistics
Nonparametric statistics
Mathematical optimization
Geometric algorithms
Descriptive statistics