• Publications
  • Influence
Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions
This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings and proposes three effective and efficient techniques for obtaining high-quality combiners (consensus functions). Expand
Top 10 algorithms in data mining
This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN,Expand
Clustering with Bregman Divergences
This paper proposes and analyzes parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences, and shows that there is a bijection between regular exponential families and a largeclass of BRegman diverGences, that is called regular Breg man divergence. Expand
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions
A generative mixture-model approach to clustering directional data based on the von Mises-Fisher distribution, which arises naturally for data distributed on the unit hypersphere, and derives and analyzes two variants of the Expectation Maximization framework for estimating the mean and concentration parameters of this mixture. Expand
Cluster ensembles: a knowledge reuse framework for combining partitionings
This contribution is to formally define the cluster ensemble problem as an optimization problem and to propose three effective and efficient combiners for solving it based on a hypergraph model. Expand
Discovering important people and objects for egocentric video summarization
This work introduced novel egocentric features to train a regressor that predicts important regions and produces significantly more informative summaries than traditional methods that often include irrelevant or redundant information. Expand
Impact of Similarity Measures on Web-page Clustering
Clustering of web documents enables (semi-)automated categorization, and facilitates certain types of search. Any clustering method has to embed the documents in a suitable similarity space. WhileExpand
Error Correlation and Error Reduction in Ensemble Classifiers
This paper focuses on data selection and classifier training methods, in order to 'prepare' classifiers for combining, and discusses several methods that make the classifiers in an ensemble more complementary. Expand
A generalized maximum entropy approach to bregman co-clustering and matrix approximation
This paper presents a substantially generalized co-clustering framework wherein any Bregman divergence can be used in the objective function, and various conditional expectation based constraints can be considered based on the statistics that need to be preserved. Expand