# Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search

@inproceedings{HarPeled2017ProximityIT,
title={Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search},
booktitle={SODA},
year={2017}
}
• Published in SODA 23 November 2015
• Computer Science, Mathematics
We introduce a new variant of the nearest neighbor search problem, which allows for some coordinates of the dataset to be arbitrarily corrupted or unknown. Formally, given a dataset of $n$ points $P=\{ x_1,\ldots, x_n\}$ in high-dimensions, and a parameter $k$, the goal is to preprocess the dataset, such that given a query point $q$, one can compute quickly a point $x \in P$, such that the distance of the query to the point $x$ is minimized, when ignoring the "optimal" $k$ coordinates. Note…
8 Citations

### LSH on the Hypercube Revisited

• Computer Science
ArXiv
• 2017

### C G ] 9 A pr 2 01 7 LSH on the Hypercube Revisited

• Mathematics, Computer Science
• 2021
This note revisits the most basic settings, where P is a set of points in the binary hypercube {0, 1} d , under the L1/Hamming metric, and presents a short description of the LSH scheme in this case.

### Sublinear algorithms for massive data problems

This thesis presents algorithms and proves lower bounds for fundamental computational problems in the models that address massive data sets, and introduces theoretical problems and concepts that model computational issues arising in databases, computer vision and other areas.

### Analysis of the Period Recovery Error Bound

• Computer Science
ESA
• 2020
This paper provides the first analysis of the relationship between the error bound and the number of candidates, as well as identification of the error parameters that still guarantee recovery, and provides a hierarchy of more restrictive upper error bounds that asymptotically reduces the size of the potential period candidate set.

### Index Structures for Fast Similarity Search for Binary Vectors

Index structures are presented that are based on hash tables and similarity-preserving hashing and also on tree structures, neighborhood graphs, and distributed neural autoassociation memory for fast similarity search for objects represented by binary vectors.

## References

SHOWING 1-10 OF 30 REFERENCES

### Approximate line nearest neighbor in high dimensions

• Computer Science, Mathematics
SODA
• 2009
This work considers the problem of approximate nearest neighbors in high dimensions, when the queries are lines, and designs a data structure to support efficiently the following queries: given a line L, report the point p closest to L.

### Approximate k-flat Nearest Neighbor Search

• Computer Science
STOC
• 2015
This work presents the first efficient data structure that can handle approximate nearest neighbor queries for arbitrary k, and generalizes the techniques of AIKN for 1-ANN: the authors partition P into clusters of increasing radius, and build a low-dimensional data structure for a random projection of P.

### Entropy based nearest neighbor search in high dimensions

The problem of finding the approximate nearest neighbor of a query point in the high dimensional space is studied, focusing on the Euclidean space, and it is shown that the <i>c</i> nearest neighbor can be computed in time and near linear space where <i*p</i><sup> ≈ 2.06/<i*c—i> becomes large.

### Two algorithms for nearest-neighbor search in high dimensions

A new approach to the nearest-neighbor problem is developed, based on a method for combining randomly chosen one-dimensional projections of the underlying point set, which results in an algorithm for finding e-approximate nearest neighbors with a query time of O((d log d)(d + log n)).

### Approximate Nearest Line Search in High Dimensions

The bounds achieved by the data structure match the performance of the best algorithm for the approximate nearest neighbor problem for point sets, and are the first high-dimensional data structure for this problem with poly-logarithmic query time and polynomial space.

### Efficient search for approximate nearest neighbor in high dimensional spaces

• Computer Science
STOC '98
• 1998
Significantly improving and extending recent results of Kleinberg, data structures whose size is polynomial in the size of the database and search algorithms that run in time nearly linear or nearly quadratic in the dimension are constructed.

### An Optimal Randomized Cell Probe Lower Bound for Approximate Nearest Neighbor Searching

• Computer Science
SIAM J. Comput.
• 2010
The approximate nearest neighbor search problem on the Hamming cube is considered and it is shown that a randomized cell probe algorithm that uses polynomial storage and word size requires a worst case query time of $d^{O(1)$ and considerations of bit complexity alone cannot prove any nontrivial cell probe lower bound for the problem.

### Optimal Data-Dependent Hashing for Approximate Near Neighbors

• Computer Science
STOC
• 2015
The new bound is not only optimal, but in fact improves over the best LSH data structures (Indyk, Motwani 1998) (Andoni, Indyk 2006) for all approximation factors c>1.

### Locality-sensitive hashing scheme based on p-stable distributions

• Computer Science
SCG '04
• 2004
A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1.

### Beyond Locality-Sensitive Hashing

• Computer Science, Mathematics
SODA
• 2014
By a standard reduction, a new data structure is presented for the Hamming space and e1 norm with ρ ≤ 7/(8c)+ O(1/c3/2)+ oc(1), which is the first improvement over the result of Indyk and Motwani (STOC 1998).