Locality-sensitive hashing scheme based on p-stable distributions

@inproceedings{Datar2004LocalitysensitiveHS,
  title={Locality-sensitive hashing scheme based on p-stable distributions},
  author={Mayur Datar and Nicole Immorlica and Piotr Indyk and Vahab S. Mirrokni},
  booktitle={SCG '04},
  year={2004}
}
We present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions.Our scheme improves the running time of the earlier algorithm for the case of the lp norm. It also yields the first known provably efficient approximate NN algorithm for the case p<1. We also show that the algorithm finds the exact near neigbhor in O(log n) time for data satisfying certain "bounded growth" condition.Unlike earlier schemes, our LSH… 

Figures from this paper

Locality-sensitive Hashing Using Stable Distributions 4.1 the Lsh Scheme Based on S-stable Distributions
  • Mathematics
In this chapter, we introduce and analyze a novel locality-sensitive hashing family. The family is defined for the case where the distances are measured according to the l s norm, for any s ∈ [0, 2].
Locality-Sensitive Hashing for Finding Nearest Neighbors in Probability Distributions
TLDR
This paper presents a novel LSH scheme adapted to angular distance for ANN search in high-dimensional probability distributions, and proposes a Sequential Interleaving algorithm based on the “Unbalance Effect” of Euclidean and angular metrics for probability distributions.
Locality-sensitive Hashing Using Stable Distributions 4.1 the Lsh Scheme Based on S-stable Distributions
  • Mathematics
In this chapter, we introduce and analyze a novel locality-sensitive hashing family. The family is defined for the case where the distances are measured according to the l s norm, for any s ∈ [0, 2].
Robust and Efficient Locality Sensitive Hashing for Nearest Neighbor Search in Large Data Sets
TLDR
This paper introduces a distributio n-free LSH algorithm that allows one to reduce the number of hash tables, and is hence memory-efficient, while achieving high accuracy, and shows that the algorithm accurate ly retrieves nearest neighbors faster than other standard LSH algorithms do, and maintains nearly uniform number of per-bucket points.
A Refined Analysis of LSH for Well-dispersed Data Points
TLDR
This paper presents first rigorous proof on how LSHs make use of the structure of data points, and provides important insights into parameter setting in the practice of LSH beyond worst case.
Query Range Sensitive Probability Guided Multi-probe Locality Sensitive Hashing
TLDR
A novel probability model and a query-adaptive algorithm to generate the optimal multi-probe sequence for range queries which takes the query range into account and can probe fewer points than MPLSH for getting the same recall.
Entropy based locality sensitive hashing
TLDR
A set of new hash mapping functions based on entropy for LSH will be proposed, which will mean the distribution of mapped values will be approximately uniform, which is the maximum entropy distribution.
Fast locality-sensitive hashing
TLDR
A new and simple method to speed up the widely-used Euclidean realization of LSH by the use of randomized Hadamard transforms in a non-linear setting and shows that using the new LSH in nearest-neighbor applications can improve their running times by significant amounts.
Distributed KNN-graph approximation via hashing
TLDR
This paper introduces a new KNN-join method based on RMMH, a recently introduced hash function family based on randomly trained classifiers, and shows that the resulting hash tables are much more balanced and the number of resulting collisions can be greatly reduced without degrading quality.
Locality-Sensitive Hashing for Chi2 Distance
TLDR
The results prove the relevance of such a new LSH scheme either providing far better accuracy in the context of image retrieval than euclidean scheme for an equivalent speed, or providing an equivalent accuracy but with a high gain in terms of processing speed.
...
...

References

SHOWING 1-10 OF 41 REFERENCES
Efficient search for approximate nearest neighbor in high dimensional spaces
TLDR
Significantly improving and extending recent results of Kleinberg, data structures whose size is polynomial in the size of the database and search algorithms that run in time nearly linear or nearly quadratic in the dimension are constructed.
Finding nearest neighbors in growth-restricted metrics
TLDR
This paper develops an efficient dynamic data structure for nearest neighbor queries in growth-constrained metrics that satisfy the property that for any point q and number r the ratio between numbers of points in balls of radius 2r and r is bounded by a constant.
Approximate nearest neighbors: towards removing the curse of dimensionality
TLDR
Two algorithms for the approximate nearest neighbor problem in high-dimensional spaces are presented, which require space that is only polynomial in n and d, while achieving query times that are sub-linear inn and polynometric in d.
Locally lifting the curse of dimensionality for nearest neighbor search (extended abstract)
TLDR
The idea of aggressive pruning is introduced and a family of practical algorithms, an idealized analysis, and experiments are described that may contribute to improved general purpose algorithms for high dimensions.
Two algorithms for nearest-neighbor search in high dimensions
TLDR
A new approach to the nearest-neighbor problem is developed, based on a method for combining randomly chosen one-dimensional projections of the underlying point set, which results in an algorithm for finding e-approximate nearest neighbors with a query time of O((d log d)(d + log n)).
Similarity Search in High Dimensions via Hashing
TLDR
Experimental results indicate that the novel scheme for approximate similarity search based on hashing scales well even for a relatively large number of dimensions, and provides experimental evidence that the method gives improvement in running time over other methods for searching in highdimensional spaces based on hierarchical tree decomposition.
Scalable Techniques for Clustering the Web
TLDR
This paper aims to efficiently cluster similar pages on the web, using the technique of Locality-Sensitive Hashing (LSH), in which web pages are hashed in such a way that similar pages have a much higher probability of collision than dissimilar pages.
Nearest neighbor queries in metric spaces
TLDR
The preprocessing algorithm for M(S,Q) can be used to solve the all nearest neighbor problem for S in O(n(log n)2(log ϒ(S)2) expected time) and the resource bounds increase linearly in K .
Navigating nets: simple algorithms for proximity search
TLDR
This work presents a simple deterministic data structure for maintaining a set S of points in a general metric space, while supporting proximity search and updates to S (insertions and deletions) and is essentially optimal in a certain model of distance computation.
Fast mining of massive tabular data via approximate distance computations
TLDR
The methods are for computing the "distance" between any two subregions of tabular data: they are approximate, but highly accurate as the authors prove mathematically, and they are fast, running in time nearly linear in the table size.
...
...