Hierarchical Navigable Small World
   HOME

TheInfoList



OR:

The Hierarchical navigable small world (HNSW) algorithm is a
graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discret ...
-based approximate nearest neighbor search technique used in many
vector database A vector database, vector store or vector search engine is a database that uses the vector space model to store vectors (fixed-length lists of numbers) along with other data items. Vector databases typically implement one or more Nearest neighbor ...
s. Nearest neighbor search without an index involves computing the distance from the query to each point in the database, which for large datasets is computationally prohibitive. For high-dimensional data, tree-based exact vector search techniques such as the
k-d tree In computer science, a ''k''-d tree (short for ''k-dimensional tree'') is a space-partitioning data structure for organizing points in a ''k''-dimensional space. K-dimensional is that which concerns exactly k orthogonal axes or a space of any ...
and
R-tree R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984 and has found s ...
do not perform well enough because of the
curse of dimensionality The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. T ...
. To remedy this, ''approximate'' k-nearest neighbor searches have been proposed, such as
locality-sensitive hashing In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller than the universe of possible input items.) Si ...
(LSH) and
product quantization Product may refer to: Business * Product (business), an item that can be offered to a market to satisfy the desire or need of a customer. * Product (project management), a deliverable or set of deliverables that contribute to a business solution ...
(PQ) that trade performance for accuracy. The HNSW graph offers an approximate k-nearest neighbor search which scales logarithmically even in high-dimensional data. It is an extension of the earlier work on navigable small world graphs presented at the Similarity Search and Applications (SISAP) conference in 2012 with an additional hierarchical navigation to find entry points to the main graph faster. HNSW-based libraries are among the best performers in the approximate nearest neighbors benchmark. A related technique is IVFFlat.


Use in vector databases

HNSW is a key method for approximate nearest neighbor search in high-dimensional vector databases, for example in the context of embeddings from neural networks in large language models. Databases that use HNSW as search index include: *
Apache Lucene Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as a ...
Vector Search * Chroma * Qdrant * Vespa * Vearch Gamma * Weaviate * pgvector *
MariaDB MariaDB is a community-developed, commercially supported Fork (software development), fork of the MySQL relational database management system (RDBMS), intended to remain free and open-source software under the GNU General Public License. Developm ...
*
MongoDB MongoDB is a source-available, cross-platform, document-oriented database program. Classified as a NoSQL database product, MongoDB uses JSON-like documents with optional database schema, schemas. Released in February 2009 by 10gen (now MongoDB ...
Atlas *
ClickHouse ClickHouse is an open-source column-oriented DBMS (columnar database management system) for online analytical processing (OLAP) that allows users to generate analytical reports using SQL queries in real-time. ClickHouse Inc. is headquartered in ...
* Milvus *
DuckDB DuckDB is an open-source column-oriented Relational Database Management System (RDBMS). It is designed to provide high performance on complex queries against large databases in embedded configuration, such as combining tables with hundreds of ...
* Kuzu * Cozo Several of these use either the hnswlib library provided by the original authors, or the
FAISS FAISS (Facebook AI Similarity Search) is an open-source library for similarity search and clustering of vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains su ...
library. libvictor is another high-performance library that implements HNSW and other indexing structures, designed for flexibility and integration in custom vector database solutions.


References

Graphs Network science Search algorithms Database index techniques Data mining Machine learning {{Compu-stub