• Publications
  • Influence
The PageRank Citation Ranking : Bringing Order to the Web
The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about theExpand
  • 12,336
  • 1658
Approximate nearest neighbors: towards removing the curse of dimensionality
We present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces. For data sets of size n living in R d , the algorithms require space that is only polynomial in nExpand
  • 3,796
  • 520
Similarity Search in High Dimensions via Hashing
The nearestor near-neighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in buildingExpand
  • 3,087
  • 508
Models and issues in data stream systems
In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrivesExpand
  • 2,810
  • 163
Randomized Algorithms
For many applications, a randomized algorithm is either the simplest or the fastest algorithm available, and sometimes both. This book introduces the basic concepts in the design and analysis ofExpand
  • 1,909
  • 147
Dynamic itemset counting and implication rules for market basket data
We consider the problem of analyzing market-basket data and present several important contributions. First, we present a new algorithm for finding large itemsets which uses fewer passes over the dataExpand
  • 2,142
  • 121
Beyond market baskets: generalizing association rules to correlations
One of the most well-studied problems in data mining is mining for association rules in market basket data. Association rules, whose significance is measured via support and confidence, are intendedExpand
  • 1,542
  • 98
Approximate Frequency Counts over Data Streams
Research in data stream algorithms has blossomed since late 90s. The talk will trace the history of the Approximate Frequency Counts paper, how it was conceptualized and how it influenced data streamExpand
  • 803
  • 84
Maintaining Stream Statistics over Sliding Windows
We consider the problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far. We refer to this model as the sliding window model. WeExpand
  • 776
  • 64
Proof verification and hardness of approximation problems
The class PCP(f(n),g(n)) consists of all languages L for which there exists a polynomial-time probabilistic oracle machine that used O(f(n)) random bits, queries O(g(n)) bits of its oracle andExpand
  • 1,031
  • 61