Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors
@inproceedings{Baranchuk2018RevisitingTI, title={Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors}, author={Dmitry Baranchuk and Artem Babenko and Yury Malkov}, booktitle={European Conference on Computer Vision}, year={2018} }
This work addresses the problem of billion-scale nearest neighbor search. The state-of-the-art retrieval systems for billion-scale databases are currently based on the inverted multi-index, the recently proposed generalization of the inverted index structure. The multi-index provides a very fine-grained partition of the feature space that allows extracting concise and accurate short-lists of candidates for the search queries. In this paper, we argue that the potential of the simple inverted…
50 Citations
Inverted Semantic-Index for Image Retrieval
- Computer ScienceArXiv
- 2022
This paper replaces the clustering method with image classification, during the construction of codebook, and proposes a merging and method to solve the problem that the number of partitions is unchangeable in the inverted semantic-index.
Vector and Line Quantization for Billion-scale Similarity Search on GPUs
- Computer ScienceFuture Gener. Comput. Syst.
- 2019
SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search
- Computer ScienceNeurIPS
- 2021
This paper presents a simple but efficient memory-disk hybrid indexing and search system, named SPANN, that follows the inverted index methodology and guarantees both disk-access efficiency and high recall by effectively reducing the disk- access number and retrieving high-quality posting lists.
DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node
- Computer Science
- 2019
It is demonstrated that the SSD-based indices built by DiskANN can meet all three desiderata for large-scale ANNS: high-recall, low query latency and high density (points indexed per node).
Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval
- Computer ScienceWWW
- 2022
This work addresses the problem of massive-scale embedding-based retrieval with Bi-Granular Document Representation, where the lightweight sparse embeddings are indexed and standby in memory for coarse-grained candidate search, and the heavyweight dense embedDings are hosted in disk for fine- grained post verification.
Hierarchical quantization for billion-scale similarity retrieval on GPUs
- Computer ScienceComput. Electr. Eng.
- 2021
Efficient Nearest Neighbor Search by Removing Anti-hub
- Computer ScienceICMR
- 2021
This work empirically found that such unnecessary vectors have low hubness scores and thus can be easily identified beforehand and removed by removing anti-hubs, achieving a memory-efficient search while preserving accuracy.
Hybrid Approximate Nearest Neighbor Indexing and Search (HANNIS) for Large Descriptor Databases
- Computer Science2022 IEEE International Conference on Big Data (Big Data)
- 2022
A new hybrid method for indexing and searching for the approximate nearest neighbors in high-dimensional large deep-descriptor databases retrieves truly similar items in the database, even if the retrieval set is large.
FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search
- Computer ScienceArXiv
- 2021
FreshDiskANN is presented, a system that can index over a billion points on a workstation with an SSD and limited memory, and support thousands of concurrent real-time inserts, deletes and searches per second each, while retaining 5-10x reduction in the cost of maintaining freshness in indices when compared to existing methods.
HQANN: Efficient and Robust Similarity Search for Hybrid Queries with Structured and Unstructured Constraints
- Computer ScienceCIKM
- 2022
HQANN is a simple yet highly efficient hybrid query processing framework which can be easily embedded into existing proximity graph-based ANNS algorithms and guarantees both low latency and high recall by leveraging navigation sense among attributes and fusing vector similarity search with attribute filtering.
24 References
Efficient Indexing of Billion-Scale Datasets of Deep Descriptors
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This paper introduces a new dataset of one billion descriptors based on DNNs and reveals the relative inefficiency of IMI-based indexing for such descriptors compared to SIFT data, and introduces two new indexing structures that provide considerably better trade-off between the speed of retrieval and recall, given similar amount of memory, as compared to the standard Inverted Multi-Index.
The Inverted Multi-Index
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2015
Inverted multi-indices were able to significantly improve the speed of approximate nearest neighbor search on the dataset of 1 billion SIFT vectors compared to the best previously published systems, while achieving better recall and incurring only few percent of memory overhead.
Improving Bilayer Product Quantization for Billion-Scale Approximate Nearest Neighbors in High Dimensions
- Computer ScienceArXiv
- 2014
This work introduces and evaluates two approximate nearest neighbor search systems that both exploit the synergy of product quantization processes in a more efficient way and provides a significantly better recall for the same runtime at a cost of small memory footprint increase.
Object retrieval with large vocabularies and fast spatial matching
- Computer Science2007 IEEE Conference on Computer Vision and Pattern Recognition
- 2007
To improve query performance, this work adds an efficient spatial verification stage to re-rank the results returned from the bag-of-words model and shows that this consistently improves search quality, though by less of a margin when the visual vocabulary is large.
Sparse composite quantization
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
Sparse composite quantization is developed, which constructs sparse dictionaries and the benefit is that the distance evaluation between the query and the dictionary element (a sparse vector) is accelerated using the efficient sparse vector operation, and thus the cost of distance table computation is reduced a lot.
Efficient Large-Scale Approximate Nearest Neighbor Search on the GPU
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work proposes a two level product and vector quantization tree that reduces the number of vector comparisons required during tree traversal and includes a novel highly parallelizable re-ranking method for candidate vectors by efficiently reusing already computed intermediate values.
Polysemous Codes
- Computer ScienceECCV
- 2016
Polysemous codes are introduced, which offer both the distance estimation quality of product quantization and the efficient comparison of binary codes with Hamming distance, and their design is inspired by algorithms introduced in the 90's to construct channel-optimized vector quantizers.
Searching in one billion vectors: Re-rank with source coding
- Computer Science2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2011
This paper releases a new public dataset of one billion 128-dimensional vectors and proposed an experimental setup to evaluate high dimensional indexing algorithms on a realistic scale and accurately and efficiently re-ranks the neighbor hypotheses using little memory compared to the full vectors representation.
Fast Neighborhood Graph Search Using Cartesian Concatenation
- Computer Science2013 IEEE International Conference on Computer Vision
- 2013
Experimental results on searching over large scale datasets (SIFT, GISTand HOG) show that the proposed new data structure for approximate nearest neighbor search outperforms state-of-the-art ANN search algorithms in terms of efficiency and accuracy.
Composite Quantization for Approximate Nearest Neighbor Search
- Computer ScienceICML
- 2014
This paper presents a novel compact coding approach, composite quantization, for approximate nearest neighbor search. The idea is to use the composition of several elements selected from the…