A scalable solution to the nearest neighbor search problem through local-search methods on neighbor graphs

@article{Tellez2021ASS,
  title={A scalable solution to the nearest neighbor search problem through local-search methods on neighbor graphs},
  author={Eric Sadit Tellez and Guillermo Ruiz and Edgar Ch{\'a}vez and Mario Graff},
  journal={Pattern Analysis and Applications},
  year={2021},
  volume={24},
  pages={763-777}
}
Nearest neighbor search is a powerful abstraction for data access; however, data indexing is troublesome even for approximate indexes. For intrinsically high-dimensional data, high-quality fast searches demand either indexes with impractically large memory usage or preprocessing time. In this paper, we introduce an algorithm to solve a nearest-neighbor query q by minimizing a kernel function defined by the distance from q to each object in the database. The minimization is performed using… 
Similarity search on neighbor's graphs with automatic Pareto optimal performance and minimum expected quality setups based on hyperparameter optimization
TLDR
This manuscript introduces an autotuned algorithm for constructing and searching nearest neighbors based on neighbor graphs and optimization meta∗ and is described and benchmarked with other stateof-the-art similarity search methods, showing convenience and competitiveness.
SimilaritySearch.jl : Autotuned nearest neighbor indexes for Julia
TLDR
The MIT-licensed Julia package SimilaritySearch.jl is described, which provides algorithms to efficiently retrieve k nearest neighbors from a metric dataset and other related problems with no knowledge of the underlying algorithms, since the main structure, the SearchGraph, has autotuning capabilities.
A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search
TLDR
This study provides a thorough comparative analysis and experimental evaluation of 13 representative graph-based ANNS algorithms via a new taxonomy and fine-grained pipeline, and designs an optimized method that outperforms the state-of-the-art algorithms.

References

SHOWING 1-10 OF 40 REFERENCES
Distributed Complementary Binary Quantization for Joint Hash Table Learning
TLDR
The proposed (D-)CBQ exploits the power of prototype-based incomplete binary coding to well align the data distributions in the original space and the Hamming space and further utilizes the nature of multi-index search to jointly reduce the quantization loss.
Complementary Binary Quantization for Joint Multiple Indexing
TLDR
A complementary binary quantization (CBQ) method to jointly learning multiple hash tables that exploits the power of incomplete binary coding based on prototypes to align the original space and the Hamming space, and further utilizes the nature of multi-indexing search to jointly reduce the quantization loss based on the prototype based hash function.
Finding Near Neighbors Through Local Search
TLDR
Three searching algorithms generalizing to local search other than greedy are introduced, and it is experimentally proved that this approach improves significantly the state of the art.
Scalable Distributed Algorithm for Approximate Nearest Neighbor Search Problem in High Dimensional General Metric Spaces
TLDR
The performed simulation for data in the Euclidian space shows that the structure built using the proposed algorithm has navigable small world properties with logarithmic search complexity at fixed accuracy and has weak (power law) scalability with the dimensionality of the stored data.
Distance Metric Learning for Large Margin Nearest Neighbor Classification
TLDR
This paper shows how to learn a Mahalanobis distance metric for kNN classification from labeled examples in a globally integrated manner and finds that metrics trained in this way lead to significant improvements in kNN Classification.
MI-File: using inverted files for scalable approximate similarity search
TLDR
A new efficient and accurate technique for generic approximate similarity searching, based on the use of inverted files, that enables us to use inverted files to obtain very efficiently a very small set of good candidates for the query result.
Fast Nearest Neighbor Search with Transformed Residual Quantization
  • Jiangbo Yuan, Xiuwen Liu
  • Computer Science
    2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)
  • 2016
TLDR
This work proposes a new strategy, called, transformed RQ (TRQ), that jointly learns a local transformation per residual cluster with an ultimate goal to further reduce overall quantization errors and proposes a hybrid approximate nearest search method based on the proposed TRQ and PQ.
...
...