The cover tree is a type of
data structure
In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
in
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
that is specifically designed to facilitate the speed-up of a
nearest neighbor search
Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function: ...
. It is a refinement of the Navigating Net data structure, and related to a variety of other data structures developed for indexing intrinsically low-dimensional data.
[Kenneth Clarkson. Nearest-neighbor searching and metric space dimensions. In G. Shakhnarovich, T. Darrell, and P. Indyk, editors, Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, pages 15--59. MIT Press, 2006.]
The tree can be thought of as a hierarchy of levels with the top level containing the root
point and the bottom level containing every point in the metric space. Each level ''C'' is associated with an integer value ''i'' that decrements by one as the tree is descended. Each level ''C'' in the cover tree has three important properties:
*Nesting:
*Covering: For every point
, there exists a point
such that the distance from
to
is less than or equal to
and exactly one such
is a parent of
.
*Separation: For all points
, the distance from
to
is greater than
.
Complexity
Find
Like other
metric tree
A metric tree is any tree data structure specialized to index data in metric spaces. Metric trees exploit properties of metric spaces such as the triangle inequality to make accesses to the data more efficient. Examples include the M-tree, vp-tr ...
s the cover tree allows for nearest neighbor searches in
where
is a constant associated with the dimensionality of the dataset and n is the cardinality. To compare, a basic linear search requires
, which is a much worse dependence on
. However, in high-dimensional
metric space
In mathematics, a metric space is a Set (mathematics), set together with a notion of ''distance'' between its Element (mathematics), elements, usually called point (geometry), points. The distance is measured by a function (mathematics), functi ...
s the
constant is non-trivial, which means it cannot be ignored in complexity analysis. Unlike other metric trees, the cover tree has a theoretical bound on its constant that is based on the dataset's
expansion constant
Expansion may refer to:
Arts, entertainment and media
* ''L'Expansion'', a French monthly business magazine
* ''Expansion'' (album), by American jazz pianist Dave Burrell, released in 2004
* ''Expansions'' (McCoy Tyner album), 1970
* ''Expansi ...
or doubling constant (in the case of approximate NN retrieval). The bound on search time is
where
is the expansion constant of the dataset.
Insert
Although cover trees provide faster searches than the naive approach, this advantage must be weighed with the additional cost of maintaining the data structure. In a naive approach adding a new point to the dataset is trivial because order does not need to be preserved, but in a cover tree it can take
time. However, this is an upper-bound, and some techniques have been implemented that seem to improve the performance in practice.
Space
The cover tree uses implicit representation to keep track of repeated points. Thus, it only requires O(n) space.
See also
*
Nearest neighbor search
Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function: ...
*
kd-tree
In computer science, a ''k''-d tree (short for ''k-dimensional tree'') is a space-partitioning data structure for organizing points in a ''k''-dimensional space. K-dimensional is that which concerns exactly k orthogonal axes or a space of any ...
*
M-tree
In computer science, M-trees are tree data structures that are similar to R-trees and B-trees. It is constructed using a metric and relies on the triangle inequality for efficient range and k-nearest neighbor (k-NN) queries.
While M-trees can per ...
References
;Notes
;Bibliography
* Alina Beygelzimer, Sham Kakade, and John Langford. Cover Trees for Nearest Neighbor. In Proc. International Conference on Machine Learning (ICML), 2006.
*
JL's Cover Tree page John Langford's page links to papers and code.
*
A C++ Cover Tree implementation on GitHub
*
A cover tree implementation in Java.
{{CS-Trees
Trees (data structures)