Results of the NeurIPS'21 Challenge on Billion-Scale Approximate Nearest Neighbor Search

  title={Results of the NeurIPS'21 Challenge on Billion-Scale Approximate Nearest Neighbor Search},
  author={Harsha Vardhan Simhadri and G. R. Williams and Martin Aum{\"u}ller and Matthijs Douze and Artem Babenko and Dmitry Baranchuk and Qi Chen and Lucas Hosseini and Ravishankar Krishnaswamy and Gopal Srinivasa and Suhas Jayaram Subramanya and Jingdong Wang},
Despite the broad range of algorithms for Approximate Nearest Neighbor Search, most empirical evaluations of algorithms have focused on smaller datasets, typically of 1 million points (Aum¨uller et al., 2020). However, deploying recent advances in embedding based techniques for search, recommendation and ranking at scale require ANNS indices at billion, trillion or larger scale. Barring a few recent papers, there is limited consensus on which algorithms are effective at this scale vis-`a-vis… 

Figures and Tables from this paper

Manu: A Cloud Native Vector Database Management System
Manu is a cloud native vector database that extensively optimize for performance and usability with hardware-aware implementations and support for complex search semantics, and utilizes multi-version concurrency control (MVCC) and a delta consistency model to simplify the communication and cooperation among the system components.


DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node
It is demonstrated that the SSD-based indices built by DiskANN can meet all three desiderata for large-scale ANNS: high-recall, low query latency and high density (points indexed per node).
A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search
This study provides a thorough comparative analysis and experimental evaluation of 13 representative graph-based ANNS algorithms via a new taxonomy and fine-grained pipeline, and designs an optimized method that outperforms the state-of-the-art algorithms.
Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors
It is argued that the potential of the simple inverted index was not fully exploited in previous works and advocate its usage both for the highly-entangled deep descriptors and relatively disentangled SIFT descriptors.
SRS: Solving c-Approximate Nearest Neighbor Queries in High Dimensional Euclidean Space with a Tiny Index
Several surprisingly simple methods to answer c-ANN queries with theoretical guarantees requiring only a single tiny index are proposed and demonstrate superior performance against the state-of-the-art LSH-based methods, and scale up well to 1 billion high-dimensional points on a single commodity PC.
ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms
ANN-Benchmarks provides a standard interface for measuring the performance and quality achieved by nearest neighbor algorithms on different standard data sets and supports several different ways of integrating k-NN algorithms, and its configuration system automatically tests a range of parameter settings for each algorithm.
HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces
This paper proposes a novel yet simple indexing scheme, HD-Index, to solve the problem of approximate k-nearest neighbor queries in massive high-dimensional databases and uses Ptolemaic inequality to produce better lower bounds.
Cover trees for nearest neighbor
A tree data structure for fast nearest neighbor operations in general n-point metric spaces (where the data set consists of n points) that shows speedups over the brute force search varying between one and several orders of magnitude on natural machine learning datasets.
Efficient Indexing of Billion-Scale Datasets of Deep Descriptors
This paper introduces a new dataset of one billion descriptors based on DNNs and reveals the relative inefficiency of IMI-based indexing for such descriptors compared to SIFT data, and introduces two new indexing structures that provide considerably better trade-off between the speed of retrieval and recall, given similar amount of memory, as compared to the standard Inverted Multi-Index.
FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search
This paper presents the first graph-based ANNS index that reflects corpus updates into the index in real-time without compromising on search performance, and designs FreshDiskANN, a system that can index over a billion points on a workstation with an SSD and limited memory.
Billion-Scale Similarity Search with GPUs
This paper proposes a novel design for an inline-formula that enables the construction of a high accuracy, brute-force, approximate and compressed-domain search based on product quantization, and applies it in different similarity search scenarios.