• Publications
  • Influence
Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions
tl;dr
This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. Expand
  • 3,567
  • 660
  • Open Access
Top 10 algorithms in data mining
tl;dr
This paper presents the top 10 data mining algorithms identified by the International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. Expand
  • 3,864
  • 189
  • Open Access
Clustering with Bregman Divergences
tl;dr
In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. Expand
  • 1,389
  • 164
  • Open Access
Cluster ensembles: a knowledge reuse framework for combining partitionings
tl;dr
We formally define the cluster ensemble problem as an optimization problem and propose three effective and efficient combiners for solving it based on a hypergraph model. Expand
  • 457
  • 101
  • Open Access
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions
tl;dr
This paper proposes a generative mixture-model approach to clustering directional data based on the von Mises-Fisher distribution, which arises naturally for data distributed on the unit hypersphere. Expand
  • 711
  • 95
  • Open Access
Impact of Similarity Measures on Web-page Clustering
Clustering of web documents enables (semi-)automated categorization, and facilitates certain types of search. Any clustering method has to embed the documents in a suitable similarity space. WhileExpand
  • 775
  • 49
  • Open Access
Discovering important people and objects for egocentric video summarization
We developed an approach to summarize egocentric video. We introduced novel egocentric features to train a regressor that predicts important regions. Using the discovered important regions, ourExpand
  • 553
  • 49
  • Open Access
Error Correlation and Error Reduction in Ensemble Classifiers
tl;dr
In this paper, we focus on data selection and classifier training methods, in order to 'prepare' classifiers for combining, especially when the training data are in limited supply. Expand
  • 619
  • 36
  • Open Access
A generalized maximum entropy approach to bregman co-clustering and matrix approximation
tl;dr
We present a substantially generalized co-clustering framework wherein any Bregman divergence can be used in the objective function, and various conditional expectation based constraints can be considered based on the statistics that need to be preserved. Expand
  • 403
  • 36
  • Open Access
Investigation of the random forest framework for classification of hyperspectral data
tl;dr
This work investigates two approaches based on the concept of random forests of classifiers implemented within a binary hierarchical multiclassifier, with the goal of achieving improved generalization of the classifier in analysis of hyperspectral data, particularly when the quantity of training data is limited. Expand
  • 672
  • 27
  • Open Access