• Corpus ID: 49556397

Approximate Nearest Neighbors in Limited Space

@inproceedings{Indyk2018ApproximateNN,
  title={Approximate Nearest Neighbors in Limited Space},
  author={Piotr Indyk and Tal Wagner},
  booktitle={COLT},
  year={2018}
}
We consider the $(1+\epsilon)$-approximate nearest neighbor search problem: given a set $X$ of $n$ points in a $d$-dimensional space, build a data structure that, given any query point $y$, finds a point $x \in X$ whose distance to $y$ is at most $(1+\epsilon) \min_{x \in X} \|x-y\|$ for an accuracy parameter $\epsilon \in (0,1)$. Our main result is a data structure that occupies only $O(\epsilon^{-2} n \log(n) \log(1/\epsilon))$ bits of space, assuming all point coordinates are integers in the… 

Figures and Tables from this paper

On Adaptive Distance Estimation

TLDR
A generic approach for transforming randomized Monte Carlo data structures which do not support adaptive queries to ones that do, and it is shown that for the problem at hand it can be applied to standard nonadaptive solutions to norm estimation with negligible overhead in query time and a factor $d$ overhead in memory.

RACE: Sub-Linear Memory Sketches for Approximate Near-Neighbor Search on Streaming Data

TLDR
An online sketching algorithm is developed that can compress vectors into a tiny sketch consisting of small arrays of counters whose size scales as $O(N^{b}\log^2{N})$, where $b < 1$ depending on the stability of the near-neighbor search.

Scalable Nearest Neighbor Search for Optimal Transport

TLDR
This work introduces a variant of this algorithm, called Flowtree, and formally proves it achieves asymptotically better accuracy, and shows that Flowtree improves over various baselines and existing methods in either running time or accuracy.

Sub-linear Memory Sketches for Near Neighbor Search on Streaming Data

TLDR
This work presents the first sublinear memory sketch that can be queried to find the nearest neighbors in a dataset, and its sketch, which consists entirely of short integer arrays, has a variety of attractive features in practice.

Breaking the Linear Iteration Cost Barrier for Some Well-known Conditional Gradient Methods Using MaxIP Data-structures

TLDR
This work provides a formal framework to combine the locality sensitive hashing type approximate MaxIP data-structures with CGM algorithms, and shows the first algorithm, where the cost per iteration is sublinear in the number of parameters, for many fundamental optimization algorithms, e.g., Frank-Wolfe, Herding algorithm, and policy gradient.

Optimal (Euclidean) Metric Compression

TLDR
The results establish that Euclidean metric compression is possible beyond dimension reduction, and mark the first improvement over compression schemes based on discretizing the classical dimensionality reduction theorem of Johnson and Lindenstrauss.

Sublinear Time Algorithm for Online Weighted Bipartite Matching

TLDR
This work provides the theoretical foundation for computing the weights approximately and shows that, with the proposed randomized data structures, the weights can be computed in sublinear time while still preserving the competitive ratio of the matching algorithm.

Multiclass Classification via Class-Weighted Nearest Neighbors

TLDR
A variant of the k-nearest neighbor classifier with non-uniform class-weightings is considered, for which upper and minimax lower bounds on accuracy, class- Weighted risk, and uniform error are derived.

A neural data structure for novelty detection

TLDR
This work found that the fruit fly olfactory circuit evolved a variant of a Bloom filter to assess the novelty of odors, and develops a class of distance- and time-sensitive Bloom filters that outperform prior filters when evaluated on several biological and computational datasets.

Accelerating Frank-Wolfe Algorithm using Low-Dimensional and Adaptive Data Structures

TLDR
This paper develops and employs two novel inner product search data structures that improve the prior fastest algorithm in NeurIPS 2021, speeding up a type of optimization algorithms called Frank-Wolfe.

References

SHOWING 1-10 OF 31 REFERENCES

Practical Data-Dependent Metric Compression with Provable Guarantees

TLDR
A new distance-preserving compact representation of multi-dimensional point-sets is introduced that almost matches the recent bound of~\cite{indyk2017near} while being much simpler and compared to Product Quantization (PQ), a state of the art heuristic metric compression method.

Efficient search for approximate nearest neighbor in high dimensional spaces

TLDR
Significantly improving and extending recent results of Kleinberg, data structures whose size is polynomial in the size of the database and search algorithms that run in time nearly linear or nearly quadratic in the dimension are constructed.

Near-Optimal (Euclidean) Metric Compression

TLDR
This paper considers metrics induced by l2 and l1 norms whose spread (the ratio of the diameter to the closest pair distance) is bounded by Φ > 0 and provides a sketch of size O(n log(1/ϵ) + log log Φ) bits per point, which it is shown is optimal.

A Survey on Learning to Hash

TLDR
This paper presents a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise Similarity preserving, implicit similarity preserve, as well as quantization, and discusses their relations.

Learning to Hash for Indexing Big Data—A Survey

TLDR
A comprehensive survey of the learning-to-hash framework and representative techniques of various types, including unsupervised, semisupervised, and supervised, is provided and recent hashing approaches utilizing the deep learning models are summarized.

Almost Optimal Explicit Johnson-Lindenstrauss Families

TLDR
This work gives explicit constructions with an almost optimal use of randomness of linear embeddings satisfying the Johnson-Lindenstrauss property, showing a lower bound of Ω(log(1/δ)/e2) on the embedding dimension.

Database-friendly random projections

TLDR
This work gives a novel construction of the embedding of k-dimensional Euclidean space, suitable for database applications, which amounts to computing a simple aggregate over k random attribute partitions.

Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing)

TLDR
This volume presents theoretical and practical discussions of nearest-neighbor (NN) methods in machine learning and examines computer vision as an application domain in which the benefit of these advanced methods is often dramatic.

Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality

TLDR
Two algorithms for the approximate nearest neighbor problem in high dimensional spaces for data sets of size n living in IR are presented, achieving query times that are sub-linear in n and polynomial in d.

Beating the Direct Sum Theorem in Communication Complexity with Implications for Sketching

TLDR
Lower bounds obtained from the direct sum result show that a number of techniques in the sketching literature are optimal, including the following: • (JL transform) Lower bound of.