• Publications
  • Influence
Clustering Aggregation
TLDR
We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with the given clusterings. Expand
  • 307
  • 30
Assessing data mining results via swap randomization
TLDR
The problem of assessing the significance of data mining results on high-dimensional 0--1 datasets has been studied extensively in the literature. Expand
  • 246
  • 25
  • PDF
LIMBO: Scalable Clustering of Categorical Data
TLDR
We introduce LIMBO, a scalable hierarchical categorical clustering algorithm that builds on the Information Bottleneck (IB) framework for quantifying the relevant information preserved when clustering. Expand
  • 261
  • 24
  • PDF
Link analysis ranking: algorithms, theory, and experiments
TLDR
The explosive growth and the widespread accessibility of the Web has led to a surge of research activity in the area of information retrieval on the World Wide Web. Expand
  • 326
  • 19
  • PDF
Clustering aggregation
TLDR
We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with the given clusterings. Expand
  • 464
  • 17
  • PDF
Finding authorities and hubs from link structures on the World Wide Web
TLDR
We undertake a comparative study of hypertext link analysis algorithms and propose some formal criteria for evaluating and comparing them. Expand
  • 277
  • 13
  • PDF
Using strong triadic closure to characterize ties in social networks
TLDR
We use the principle of Strong Triadic Closure to characterize the strength of relationships in social networks. Expand
  • 42
  • 10
  • PDF
Using the wisdom of the crowds for keyword generation
TLDR
We identify queries related to a campaign by exploiting the associations between queries and URLs as they are captured by the user's clicks. Expand
  • 161
  • 9
  • PDF
Mining the inner structure of the Web graph
TLDR
We show that the scale-free properties permeate all the components of the bow-tie which exhibit the same macroscopic properties as the Web graph. Expand
  • 70
  • 9
  • PDF
Efficient Algorithms for Sequence Segmentation
TLDR
In this paper, we present an alternative constantfactor approximation algorithm with running time O(nk) for the sequence segmentation problem. Expand
  • 92
  • 8
  • PDF