Down the Rabbit Hole: Robust Proximity Search and Density Estimation in Sublinear Space

@article{HarPeled2012DownTR,
  title={Down the Rabbit Hole: Robust Proximity Search and Density Estimation in Sublinear Space},
  author={Sariel Har-Peled and Nirman Kumar},
  journal={2012 IEEE 53rd Annual Symposium on Foundations of Computer Science},
  year={2012},
  pages={430-439}
}
  • Sariel Har-Peled, N. Kumar
  • Published 12 November 2011
  • Computer Science
  • 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science
For a set of n points in Rd, and parameters k and e, we present a data structure that answers (1 + e)-approximate k nearest neighbor queries in logarithmic time. Surprisingly, the space used by the data-structure is Õ(n/k), that is, the space used is sub linear in the input size if k is sufficiently large. Our approach provides a novel way to summarize geometric data, such that meaningful proximity queries on the data can be carried out using this sketch. Using this we provide a sub linear… 

Figures from this paper

Sub-linear Memory Sketches for Near Neighbor Search on Streaming Data
TLDR
This work presents the first sublinear memory sketch that can be queried to find the nearest neighbors in a dataset, and its sketch, which consists entirely of short integer arrays, has a variety of attractive features in practice.
Lower bounds for k-distance approximation
TLDR
It is proved that after appropriate rescaling this halving polyhedron is Hausdorff close to the unit ball with high probability, as soon as the number of points grows like Omega(d log(d)).
Robust Proximity Search for Balls Using Sublinear Space
TLDR
If k and ε are provided in advance, the data structure provides a data structure to answer such queries requiring O(n / k) space; that is, theData structure requires sublinear space if k is sufficiently large.
Nearest-Neighbor Searching Under Uncertainty I
TLDR
These are the first nontrivial methods for answering exact or exact orε-approximate queries with provable performance guarantees in polylogarithmic or sublinear time, depending on the underlying function.
Geometric Computing over Uncertain Data
Geometric Computing over Uncertain Data
Down the Rabbit Hole: Robust Proximity Search and Density Estimation in Sublinear Space
TLDR
This work presents a data structure that answers (1 + e)-approximate k nearest neighbor queries in logarithmic time, and provides a novel way to summarize geometric data, such that meaningful proximity queries on the data can be carried out.

References

SHOWING 1-10 OF 59 REFERENCES
Efficient search for approximate nearest neighbor in high dimensional spaces
TLDR
Significantly improving and extending recent results of Kleinberg, data structures whose size is polynomial in the size of the database and search algorithms that run in time nearly linear or nearly quadratic in the dimension are constructed.
Nearest-Neighbor Searching and Metric Space Dimensions
TLDR
Several measures of dimension can be estimated using nearest-neighbor searching, while others can be used to estimate the cost of that searching.
Witnessed k-distance
TLDR
This paper analyzes an approximation scheme that keeps the representation linear in the size of the input, while maintaining the guarantees on the inference quality close to those for the exact but costly representation.
Space-time tradeoffs for approximate spherical range counting
TLDR
This work presents space-time tradeoffs for approximate spherical range counting queries, broadly based on methods developed for approximate Voronoi diagrams, but it involves a number of significant extensions from the context of nearest neighbor searching to range searching.
Lower bounds for k-distance approximation
TLDR
It is proved that after appropriate rescaling this halving polyhedron is Hausdorff close to the unit ball with high probability, as soon as the number of points grows like Omega(d log(d)).
Approximate nearest neighbors: towards removing the curse of dimensionality
TLDR
Two algorithms for the approximate nearest neighbor problem in high-dimensional spaces are presented, which require space that is only polynomial in n and d, while achieving query times that are sub-linear inn and polynometric in d.
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
  • Alexandr Andoni, P. Indyk
  • Computer Science
    2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
  • 2006
We present an algorithm for the c-approximate nearest neighbor problem in a d-dimensional Euclidean space, achieving query time of O(dn 1c2/+o(1)) and space O(dn + n1+1c2/+o(1)). This almost matches
Optimal partition trees
TLDR
A new method is given that achieves simultaneously O(n log n) preprocessing time, O (n) space, and O( n1-1/d) query time with high probability and leads to more efficient multilevel partition trees, which are important in many data structural applications.
A Randomized Algorithm for Closest-Point Queries
TLDR
This result approaches the $\Omega (n^{\lceil {{d / 2}} \rceil } )$ worst-case time required for any algorithm that constructs the Voronoi...
Efficient partition trees
We prove a theorem on partitioning point sets inEd (d fixed) and give an efficient construction of partition trees based on it. This yields a simplex range searching structure with linear space,O(n
...
...