• Publications
  • Influence
The PageRank Citation Ranking : Bringing Order to the Web
This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages. Expand
Approximate nearest neighbors: towards removing the curse of dimensionality
Two algorithms for the approximate nearest neighbor problem in high-dimensional spaces are presented, which require space that is only polynomial in n and d, while achieving query times that are sub-linear inn and polynometric in d. Expand
Similarity Search in High Dimensions via Hashing
Experimental results indicate that the novel scheme for approximate similarity search based on hashing scales well even for a relatively large number of dimensions, and provides experimental evidence that the method gives improvement in running time over other methods for searching in highdimensional spaces based on hierarchical tree decomposition. Expand
Models and issues in data stream systems
The need for and research issues arising from a new model of data processing, where data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams are motivated. Expand
Dynamic itemset counting and implication rules for market basket data
A new algorithm for finding large itemsets which uses fewer passes over the data than classic algorithms, and yet uses fewer candidate itemsets than methods based on sampling and a new way of generating “implication rules” which are normalized based on both the antecedent and the consequent. Expand
Randomized Algorithms
Beyond market baskets: generalizing association rules to correlations
This work develops the notion of mining rules that identify correlations (generalizing associations), and proposes measuring significance of associations via the chi-squared test for correlation from classical statistics, enabling the mining problem to reduce to the search for a border between correlated and uncorrelated itemsets in the lattice. Expand
Approximate Frequency Counts over Data Streams
This talk will trace the history of the Approximate Frequency Counts paper, how it was conceptualized and how it influenced data stream research. Expand
Proof verification and hardness of approximation problems
The authors improve on their result by showing that NP=PCP(logn, 1), which has the following consequences: (1) MAXSNP-hard problems do not have polynomial time approximation schemes unless P=NP; and (2) for some epsilon >0 the size of the maximal clique in a graph cannot be approximated within a factor of n/sup ePSilon / unless P =NP. Expand
Maintaining Stream Statistics over Sliding Windows
The problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far, is considered, and it is shown that, using $O(\frac{1}{\epsilon} \log^2 N)$ bits of memory, the number of 1's can be estimated to within a factor of $1 + \ep silon$. Expand