• Publications
  • Influence
Similarity Search in High Dimensions via Hashing
The nearestor near-neighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in buildingExpand
  • 3,087
  • 508
Finding high-quality content in social media
The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based onExpand
  • 1,174
  • 79
Maintaining Stream Statistics over Sliding Windows
We consider the problem of maintaining aggregates and statistics over data streams, with respect to the last N data elements seen so far. We refer to this model as the sliding window model. WeExpand
  • 776
  • 64
The community-search problem and how to plan a successful cocktail party
A lot of research in graph mining has been devoted in the discovery of communities. Most of the work has focused in the scenario where communities need to be discovered with only reference to theExpand
  • 294
  • 54
The query-flow graph: model and applications
Query logs record the queries and the actions of the users of search engines, and as such they contain valuable information about the interests, the preferences, and the behavior of the users, asExpand
  • 319
  • 40
k-Nearest Neighbors in Uncertain Graphs
Complex networks, such as biological, social, and communication networks, often entail uncertainty, and thus, can be modeled as probabilistic graphs. Similar to the problem of similarity search inExpand
  • 206
  • 40
Finding interesting associations without support pruning
  • E. Cohen, Mayur Datar, +5 authors Cheng Yang
  • Computer Science
  • Proceedings of 16th International Conference on…
  • 29 February 2000
Association rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, the well-known a-priori algorithm is only effective when the only rules ofExpand
  • 375
  • 39
Know your neighbors: web spam detection using the web topology
Web spam can significantly deteriorate the quality of search engine results. Thus there is a large incentive for commercial search engines to detect spam pages efficiently and accurately. In thisExpand
  • 351
  • 38
The Discrete Basis Problem
Matrix decomposition methods represent a data matrix as a product of two factor matrices: one containing basis vectors that represent meaningful concepts in the data, and another describing how theExpand
  • 218
  • 38
Fast shortest path distance estimation in large networks
In this paper we study approximate landmark-based methods for point-to-point distance estimation in very large networks. These methods involve selecting a subset of nodes as landmarks and computingExpand
  • 255
  • 32