# Learning to Hash Robustly, with Guarantees

@article{Andoni2021LearningTH, title={Learning to Hash Robustly, with Guarantees}, author={Alexandr Andoni and Daniel Beaglehole}, journal={ArXiv}, year={2021}, volume={abs/2108.05433} }

The indexing algorithms for the high-dimensional nearest neighbor search (NNS) with the best worst-case guarantees are based on the randomized Locality Sensitive Hashing (LSH), and its derivatives. In practice, many heuristic approaches exist to "learn" the best indexing method in order to speed-up NNS, crucially adapting to the structure of the given dataset. Oftentimes, these heuristics outperform the LSH-based algorithms on real datasets, but, almost always, come at the cost of losing the…

## References

SHOWING 1-10 OF 35 REFERENCES

LSH Forest: Practical Algorithms Made Theoretical

- Computer Science, MathematicsSODA
- 2017

The end result is the first instance of a simple, practical algorithm that provably leverages data-dependent hashing to improve upon data-oblivious LSH, and is provably better than the best LSH algorithm for the Hamming space.

Learning to Hash for Indexing Big Data—A Survey

- Computer Science, MathematicsProceedings of the IEEE
- 2016

A comprehensive survey of the learning-to-hash framework and representative techniques of various types, including unsupervised, semisupervised, and supervised, is provided and recent hashing approaches utilizing the deep learning models are summarized.

Optimal Data-Dependent Hashing for Approximate Near Neighbors

- Mathematics, Computer ScienceSTOC
- 2015

The new bound is not only optimal, but in fact improves over the best LSH data structures (Indyk, Motwani 1998) (Andoni, Indyk 2006) for all approximation factors c>1.

A Heterogeneous High-Dimensional Approximate Nearest Neighbor Algorithm

- Mathematics, Computer ScienceIEEE Transactions on Information Theory
- 2012

An old style probabilistic formulation is introduced instead of the more general locality sensitive hashing (LSH) formulation, and it is shown that at least for sparse problems it recognizes much more efficient algorithms than the sparseness destroying LSH random projections.

Practical and Optimal LSH for Angular Distance

- Computer Science, MathematicsNIPS
- 2015

This work shows the existence of a Locality-Sensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent and establishes a fine-grained lower bound for the quality of any LSH family for angular distance.

Refinements to nearest-neighbor searching ink-dimensional trees

- Mathematics, Computer ScienceAlgorithmica
- 2005

This note presents a simplification and generalization of an algorithm for searchingk-dimensional trees for nearest neighbors reported by Friedmanet al [3], which can be generalized to allow a partition plane to have an arbitrary orientation, rather than insisting that it be perpendicular to a coordinate axis, as in the original algorithm.

Spectral Approaches to Nearest Neighbor Search

- Computer Science, Mathematics2014 IEEE 55th Annual Symposium on Foundations of Computer Science
- 2014

In practice, a number of spectral NNS algorithms outperform the random-projection methods that seem otherwise theoretically optimal on worst-case datasets, and theoretical justification for this disparity is provided.

LSH forest: self-tuning indexes for similarity search

- Computer ScienceWWW '05
- 2005

This index uses the well-known technique of locality-sensitive hashing (LSH), but improves upon previous designs by eliminating the different data-dependent parameters for which LSH must be constantly hand-tuned, and improving on LSH's performance guarantees for skewed data distributions while retaining the same storage and query overhead.

Learning Space Partitions for Nearest Neighbor Search

- Computer ScienceICLR
- 2020

A new framework for building space partitions reducing the problem to balanced graph partitioning followed by supervised classification is developed and the partitions obtained by Neural LSH consistently outperform partitions found by quantization-based and tree-based methods as well as classic, data-oblivious LSH.

Improved nearest neighbor search using auxiliary information and priority functions

- Computer ScienceICML
- 2018

This paper exploits properties of single and multiple random projections, which allows us to store meaningful auxiliary information at internal nodes of a random projection tree as well as to design priority functions to guide the search process that results in improved nearest neighbor search performance.