The closest pair of points problem or closest pair problem is a problem of computational geometry: given

n

points in

metric space In mathematics, a metric space is a set together with a notion of '' distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are the most general set ...

, find a pair of points with the smallest distance between them. The closest pair problem for points in the Euclidean plane was among the first geometric problems that were treated at the origins of the systematic study of the computational complexity of geometric algorithms.

Time bounds

Randomized algorithms that solve the problem in linear time are known, in

Euclidean space Euclidean space is the fundamental space of geometry, intended to represent physical space. Originally, that is, in Euclid's ''Elements'', it was the three-dimensional space of Euclidean geometry, but in modern mathematics there are Euclidean ...

s whose dimension is treated as a constant for the purposes of asymptotic analysis. This is significantly faster than the

O(n^2)

time (expressed here in big O notation) that would be obtained by a naive algorithm of finding distances between all pairs of points and selecting the smallest. It is also possible to solve the problem without randomization, in

random-access machine In computer science, random-access machine (RAM) is an abstract machine in the general class of register machines. The RAM is very similar to the counter machine but with the added capability of 'indirect addressing' of its registers. Like the cou ...

models of computation In computer science, and more specifically in computability theory and computational complexity theory, a model of computation is a model which describes how an output of a mathematical function is computed given an input. A model describes how ...

with unlimited memory that allow the use of the

floor function In mathematics and computer science, the floor function is the function that takes as input a real number , and gives as output the greatest integer less than or equal to , denoted or . Similarly, the ceiling function maps to the least int ...

, in near-linear

O(n\log\log n)

time. In even more restricted models of computation, such as the

algebraic decision tree In computational complexity the decision tree model is the model of computation in which an algorithm is considered to be basically a decision tree, i.e., a sequence of ''queries'' or ''tests'' that are done adaptively, so the outcome of the pre ...

, the problem can be solved in the somewhat slower

O(n\log n)

time bound, and this is optimal for this model, by a reduction from the

element uniqueness problem In computational complexity theory, the element distinctness problem or element uniqueness problem is the problem of determining whether all the elements of a list are distinct. It is a well studied problem in many different models of computation. ...

. Both

sweep line algorithm In computational geometry, a sweep line algorithm or plane sweep algorithm is an algorithmic paradigm that uses a conceptual ''sweep line'' or ''sweep surface'' to solve various problems in Euclidean space. It is one of the key techniques in compu ...

s and

divide-and-conquer algorithm In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved dire ...

s with this slower time bound are commonly taught as examples of these algorithm design techniques.

Linear-time randomized algorithms

A linear

expected time In computational complexity theory, the average-case complexity of an algorithm is the amount of some computational resource (typically time) used by the algorithm, averaged over all possible inputs. It is frequently contrasted with worst-case com ...

randomized algorithm of , modified slightly by

Richard Lipton Richard Jay Lipton (born September 6, 1946) is an American computer scientist who is Associate Dean of Research, Professor, and the Frederick G. Storey Chair in Computing in the College of Computing at the Georgia Institute of Technology. He has ...

to make its analysis easier, proceeds as follows, on an input set

S

consisting of

n

points in a

k

-dimensional Euclidean space: # Select

n

pairs of points uniformly at random, with replacement, and let

d

be the minimum distance of the selected pairs. # Round the input points to a square grid of points whose size (the separation between adjacent grid points) is

d

, and use a

hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...

to collect together pairs of input points that round to the same grid point. # For each input point, compute the distance to all other inputs that either round to the same grid point or to another grid point within the

Moore neighborhood In cellular automata, the Moore neighborhood is defined on a two-dimensional square lattice and is composed of a central cell and the eight cells that surround it. Name The neighborhood is named after Edward F. Moore, a pioneer of cellular aut ...

3^k-1

surrounding grid points. # Return the smallest of the distances computed throughout this process. The algorithm will always correctly determine the closest pair, because it maps any pair closer than distance

d

to the same grid point or to adjacent grid points. The uniform sampling of pairs in the first step of the algorithm (compared to a different method of Rabin for sampling a similar number of pairs) simplifies the proof that the expected number of distances computed by the algorithm is linear. Instead, a different algorithm goes through two phases: a random iterated filtering process that approximates the closest distance to within an

approximation ratio An approximation is anything that is intentionally similar but not exactly equal to something else. Etymology and usage The word ''approximation'' is derived from Latin ''approximatus'', from ''proximus'' meaning ''very near'' and the prefix ' ...

2\sqrt

, together with a finishing step that turns this approximate distance into the exact closest distance. The filtering process repeat the following steps, until

S

becomes empty: # Choose a point

p

uniformly at random from

S

. # Compute the distances from

p

to all the other points of

S

and let

d

be the minimum such distance. # Round the input points to a square grid of size

d/(2\sqrt)

, and delete from

S

all points whose Moore neighborhood has no other points. The approximate distance found by this filtering process is the final value of

d

, computed in the step before

S

becomes empty. Each step removes all points whose closest neighbor is at distance

d

or greater, at least half of the points in expectation, from which it follows that the total expected time for filtering is linear. Once an approximate value of

d

is known, it can be used for the final steps of Rabin's algorithm; in these steps each grid point has a constant number of inputs rounded to it, so again the time is linear.

Dynamic closest-pair problem

The dynamic version for the closest-pair problem is stated as follows: *Given a dynamic set of objects, find algorithms and data structures for efficient recalculation of the closest pair of objects each time the objects are inserted or deleted. If the

bounding box In geometry, the minimum or smallest bounding or enclosing box for a point set in dimensions is the box with the smallest measure (area, volume, or hypervolume in higher dimensions) within which all the points lie. When other kinds of measure ...

for all points is known in advance and the constant-time floor function is available, then the expected

O(n)

-space data structure was suggested that supports expected-time

O(\log n)

insertions and deletions and constant query time. When modified for the algebraic decision tree model, insertions and deletions would require

O(\log^2 n)

expected time. The complexity of the dynamic closest pair algorithm cited above is exponential in the dimension

d

, and therefore such an algorithm becomes less suitable for high-dimensional problems. An algorithm for the dynamic closest-pair problem in

d

dimensional space was developed by Sergey Bespamyatnikh in 1998. Points can be inserted and deleted in

O(\log n)

time per point (in the worst case).

Notes

{{reflist Geometric algorithms

Time bounds

Linear-time randomized algorithms

Dynamic closest-pair problem

See also

Notes