• Corpus ID: 244130437

SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search

  title={SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search},
  author={Qi Chen and Bing Zhao and Haidong Wang and Mingqin Li and Chuanjie Liu and Zengzhong Li and Mao Yang and Jingdong Wang},
The in-memory algorithms for approximate nearest neighbor search (ANNS) have achieved great success for fast high-recall search, but are extremely expensive when handling very large scale database. Thus, there is an increasing request for the hybrid ANNS solutions with small memory and inexpensive solid-state drive (SSD). In this paper, we present a simple but efficient memory-disk hybrid indexing and search system, named SPANN, that follows the inverted index methodology. It stores the… 

Figures from this paper

Uni-Retriever: Towards Learning The Unified Embedding Based Retriever in Bing Sponsored Search
This paper presents a novel representation learning framework Uni-Retriever developed for Bing Search, which unifies two different training modes knowledge distillation and contrastive learning to realize both required objectives of high-relevance and high-CTR retrieval.
Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings
Distill-VQ is proposed, which unifies the learning of IVF and PQ within a knowledge distillation framework and is able to derive substantial training signals from the massive unlabeled data, which significantly contributes to the retrieval quality.


DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node
It is demonstrated that the SSD-based indices built by DiskANN can meet all three desiderata for large-scale ANNS: high-recall, low query latency and high density (points indexed per node).
HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory
A novel graph-based similarity search algorithm called HM-ANN is presented, which takes both memory and data heterogeneity into consideration and enables billion-scale similarity search on a single node without using compression.
Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors
It is argued that the potential of the simple inverted index was not fully exploited in previous works and advocate its usage both for the highly-entangled deep descriptors and relatively disentangled SIFT descriptors.
GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine
GRIP achieves an order of magnitude improvements on overall system efficiency, significantly reducing the cost of vector search, while attaining equal or higher accuracy, compared with the state-of-the-art.
The Inverted Multi-Index
Inverted multi-indices were able to significantly improve the speed of approximate nearest neighbor search on the dataset of 1 billion SIFT vectors compared to the best previously published systems, while achieving better recall and incurring only few percent of memory overhead.
Pyramid: A General Framework for Distributed Similarity Search on Large-scale Datasets
Experiments on large-scale datasets show that Pyramid produces quality results for similarity search, achieves high query processing throughput and low latency, and is robust under node failure and straggler.
Pruned Bi-directed K-nearest Neighbor Graph for Proximity Search
It is shown that a graph can be derived from an approximate neighborhood graph, which costs much less to construct than a KNNG, in the same way as the PBKNNG and that it also outperforms a KNTG.
Query-driven iterated neighborhood graph search for large scale indexing
This paper presents a criterion to check if the local search over a neighborhood graph arrives at the local solution, and follows the iterated local search (ILS) strategy, widely-used in combinatorial optimization, to find a solution beyond a local optimum.
Fast Approximate Nearest Neighbor Search With Navigating Spreading-out Graphs
This paper proposes an efficient algorithm to build the NSG, and the max degree of resulting NSG is very small, thus it’s quite memory-efficient, and outperforms the state-of-art algorithms significantly on both index size and search performance.