Learn More
The nearest neighbor problem is the follolving: Given a set of n points P = (PI, . . . ,p,} in some metric space X, preprocess P so as to efficiently answer queries which require finding bhe point in P closest to a query point q E X. We focus on the particularly interesting case of the d-dimensional Euclidean space where X = Wd under some Zp norm. Despite(More)
We present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under <i>l</i><sub>p</sub> norm, based on <i>p</i>-stable distributions.Our scheme improves the running time of the earlier algorithm for the case of the <i>l</i><sub>p</sub> norm. It also yields the first known provably efficient approximate NN algorithm for(More)
A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is an essential component of a hypertext database. Hyperlinks(More)
In this article, we show several results obtained by combining the use of <i>stable distributions</i> with <i>pseudorandom generators for bounded space</i>. In particular:---We show that, for any <i>p</i> &#8712; (0, 2], one can maintain (using only <i>O</i>(log <i>n</i>/&epsi;<sup>2</sup>) words of storage) a <i>sketch</i> <i>C(q)</i> of a point <i>q</i>(More)
There are two main algorithmic approaches to sparse signal recovery: geometric and combinatorial. The geometric approach utilizes geometric properties of the measurement matrix Phi. A notable example is the Restricted Isometry Property, which states that the mapping Phi preserves the Euclidean norm of sparse signals; it is known that random dense matrices(More)
Association-rule mining has heretofore relied on the conditionof high support to do its work efficiently. In particular, the well-known a-priori algorithm is only effective when the only rules of interest are relationships that occur very frequently. However, there are a number of applications, such as data mining, identification of similar web documents,(More)
We present two algorithms for the approximate nearest neighbor problem in high dimensional spaces. For data sets of size n living in IR, the algorithms require space that is only polynomial in n and d, while achieving query times that are sub-linear in n and polynomial in d. We also show applications to other high-dimensional geometric problems, such as the(More)