• Publications
  • Influence
Clustering Aggregation
TLDR
This work gives a formal statement of the clustering-aggregation problem, an extensive empirical evaluation demonstrating the usefulness of the problem and of the solutions, and suggests a number of algorithms to improve the robustness of clusterings. Expand
LIMBO: Scalable Clustering of Categorical Data
TLDR
This work introduces LIMBO, a scalable hierarchical categorical clustering algorithm that builds on the Information Bottleneck (IB) framework for quantifying the relevant information preserved when clustering, and shows how the LIMBO algorithm can be used to cluster both tuples and values. Expand
Assessing data mining results via swap randomization
TLDR
For some datasets the structure discovered by the data mining algorithms is expected, given the row and column margins of the datasets, while for other datasets the discovered structure conveys information that is not captured by the margin counts. Expand
Clustering aggregation
TLDR
This work gives a formal statement of the clustering-aggregation problem, an extensive empirical evaluation demonstrating the usefulness of the problem and of the solutions, and suggests a number of algorithms to improve the robustness of clusterings. Expand
Link analysis ranking: algorithms, theory, and experiments
TLDR
This article works within the hubs and authorities framework defined by Kleinberg and proposes new families of algorithms, and provides an axiomatic characterization of the INDEGREE heuristic which ranks each node according to the number of incoming links. Expand
Finding authorities and hubs from link structures on the World Wide Web
TLDR
A comparative study of hypertext link analysis algorithms is undertaken, guided by some experimental queries, and some formal criteria for evaluating and comparing linkAnalysis algorithms are proposed. Expand
Using strong triadic closure to characterize ties in social networks
TLDR
This paper considers the problem of labeling the ties of a social network as strong or weak so as to enforce the Strong Triadic Closure property, and forms the problem as a novel combinatorial optimization problem, and studies it theoretically and experimentally. Expand
Efficient Algorithms for Sequence Segmentation
TLDR
This work presents an alternative constantfactor approximation algorithm with running time O(nk), called the DNS algorithm, that outperform other widely-used heuristics and can speed up solutions for other variants of the basic segmentation problem while maintaining constant their approximation factors. Expand
Using the wisdom of the crowds for keyword generation
TLDR
This work identifies queries related to a campaign by exploiting the associations between queries and URLs as they are captured by the user's clicks, and proposes algorithms within the Markov Random Field model to solve this problem. Expand
Mining the inner structure of the Web graph
TLDR
It is found that the scale-free properties permeate all the components of the bow-tie which exhibit the same macroscopic properties as the Web graph itself, however, close inspection reveals that their inner structure is quite distinct. Expand
...
1
2
3
4
5
...