• Publications
  • Influence
Approximate nearest neighbors: towards removing the curse of dimensionality
TLDR
Two algorithms for the approximate nearest neighbor problem in high-dimensional spaces are presented, which require space that is only polynomial in n and d, while achieving query times that are sub-linear inn and polynometric in d. Expand
Similarity Search in High Dimensions via Hashing
TLDR
Experimental results indicate that the novel scheme for approximate similarity search based on hashing scales well even for a relatively large number of dimensions, and provides experimental evidence that the method gives improvement in running time over other methods for searching in highdimensional spaces based on hierarchical tree decomposition. Expand
Locality-sensitive hashing scheme based on p-stable distributions
TLDR
A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1. Expand
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
TLDR
An algorithm for the c-approximate nearest neighbor problem in a d-dimensional Euclidean space, achieving query time of O and space O almost matches the lower bound for hashing-based algorithm recently obtained in [27]. Expand
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
We present an algorithm for the c-approximate nearest neighbor problem in a d-dimensional Euclidean space, achieving query time of O(dn 1c2/+o(1)) and space O(dn + n1+1c2/+o(1)). This almost matchesExpand
Enhanced hypertext categorization using hyperlinks
TLDR
This work has developed a text classifier that misclassified only 13% of the documents in the well-known Reuters benchmark; this was comparable to the best results ever obtained and its technique also adapts gracefully to the fraction of neighboring documents having known topics. Expand
Maintaining Stream Statistics over Sliding Windows
TLDR
The problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far, is considered, and it is shown that, using $O(\frac{1}{\epsilon} \log^2 N)$ bits of memory, the number of 1's can be estimated to within a factor of $1 + \ep silon$. Expand
Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality
TLDR
Two algorithms for the approximate nearest neighbor problem in high dimensional spaces for data sets of size n living in IR are presented, achieving query times that are sub-linear in n and polynomial in d. Expand
Practical and Optimal LSH for Angular Distance
TLDR
This work shows the existence of a Locality-Sensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent and establishes a fine-grained lower bound for the quality of any LSH family for angular distance. Expand
Stable distributions, pseudorandom generators, embeddings, and data stream computation
  • P. Indyk
  • Mathematics, Computer Science
  • JACM
  • 1 May 2006
TLDR
The aforementioned sketching approach directly translates into an approximate algorithm that solves the main open problem of Feigenbaum et al. Expand
...
1
2
3
4
5
...