SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search
@inproceedings{Chen2021SPANNHB, title={SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search}, author={Qi Chen and Bing Zhao and Haidong Wang and Mingqin Li and Chuanjie Liu and Zengzhong Li and Mao Yang and Jingdong Wang}, booktitle={NeurIPS}, year={2021} }
The in-memory algorithms for approximate nearest neighbor search (ANNS) have achieved great success for fast high-recall search, but are extremely expensive when handling very large scale database. Thus, there is an increasing request for the hybrid ANNS solutions with small memory and inexpensive solid-state drive (SSD). In this paper, we present a simple but efficient memory-disk hybrid indexing and search system, named SPANN, that follows the inverted index methodology. It stores the…
2 Citations
Uni-Retriever: Towards Learning The Unified Embedding Based Retriever in Bing Sponsored Search
- Computer ScienceArXiv
- 2022
This paper presents a novel representation learning framework Uni-Retriever developed for Bing Search, which unifies two different training modes knowledge distillation and contrastive learning to realize both required objectives of high-relevance and high-CTR retrieval.
Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings
- Computer ScienceArXiv
- 2022
Distill-VQ is proposed, which unifies the learning of IVF and PQ within a knowledge distillation framework and is able to derive substantial training signals from the massive unlabeled data, which significantly contributes to the retrieval quality.
References
SHOWING 1-10 OF 68 REFERENCES
DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node
- Computer Science
- 2019
It is demonstrated that the SSD-based indices built by DiskANN can meet all three desiderata for large-scale ANNS: high-recall, low query latency and high density (points indexed per node).
HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory
- Computer ScienceNeurIPS
- 2020
A novel graph-based similarity search algorithm called HM-ANN is presented, which takes both memory and data heterogeneity into consideration and enables billion-scale similarity search on a single node without using compression.
Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors
- Computer ScienceECCV
- 2018
It is argued that the potential of the simple inverted index was not fully exploited in previous works and advocate its usage both for the highly-entangled deep descriptors and relatively disentangled SIFT descriptors.
GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine
- Computer ScienceCIKM
- 2019
GRIP achieves an order of magnitude improvements on overall system efficiency, significantly reducing the cost of vector search, while attaining equal or higher accuracy, compared with the state-of-the-art.
The Inverted Multi-Index
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2012
Inverted multi-indices were able to significantly improve the speed of approximate nearest neighbor search on the dataset of 1 billion SIFT vectors compared to the best previously published systems, while achieving better recall and incurring only few percent of memory overhead.
Pyramid: A General Framework for Distributed Similarity Search on Large-scale Datasets
- Computer Science2019 IEEE International Conference on Big Data (Big Data)
- 2019
Experiments on large-scale datasets show that Pyramid produces quality results for similarity search, achieves high query processing throughput and low latency, and is robust under node failure and straggler.
Pruned Bi-directed K-nearest Neighbor Graph for Proximity Search
- Computer ScienceSISAP
- 2016
It is shown that a graph can be derived from an approximate neighborhood graph, which costs much less to construct than a KNNG, in the same way as the PBKNNG and that it also outperforms a KNTG.
Query-driven iterated neighborhood graph search for large scale indexing
- Computer ScienceACM Multimedia
- 2012
This paper presents a criterion to check if the local search over a neighborhood graph arrives at the local solution, and follows the iterated local search (ILS) strategy, widely-used in combinatorial optimization, to find a solution beyond a local optimum.
Fast Approximate Nearest Neighbor Search With Navigating Spreading-out Graphs
- Computer ScienceArXiv
- 2017
This paper proposes an efficient algorithm to build the NSG, and the max degree of resulting NSG is very small, thus it’s quite memory-efficient, and outperforms the state-of-art algorithms significantly on both index size and search performance.