Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions

@article{Strehl2002ClusterE,
  title={Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions},
  author={Alexander Strehl and Joydeep Ghosh},
  journal={J. Mach. Learn. Res.},
  year={2002},
  volume={3},
  pages={583-617}
}
This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. We first identify several application scenarios for the resultant 'knowledge reuse' framework that we call cluster ensembles. The cluster ensemble problem is then formalized as a combinatorial optimization problem in terms of shared mutual information. In addition to a direct maximization… 
Clustering ensembles: models of consensus and weak partitions
TLDR
A unified representation for multiple clusterings is introduced and a probabilistic model of consensus is proposed using a finite mixture of multinomial distributions in a space of clusterings in order to define a new consensus function related to the classical intraclass variance criterion.
An ensemble approach for generating partitional clusters from multiple cluster hierarchies
TLDR
In the document clustering domain, EPaCH is shown to yield higher quality clusters than phylogeny-based ensemble methods and than clustering based on a single feature set for three of four measures of cluster quality.
MULTI-VIEW DOCUMENT CLUSTERING WITH DIFFERENT SIMILARITY MEASUREMENTS VIA ENSEMBLE
TLDR
This paper proposes a novel multi-view clustering algorithm that combines different ensemble techniques via various similarity metrics used to measure the similarity between data objects that significantly outperforms other methods.
An enhanced clusterer aggregation using nebulous pool
TLDR
This paper has analyzed that using a layered approach in combining the clusterer outputs can help in reducing the intensive computing and also provide scope for reuse the knowledge gained for further merging.
An effective ensemble method for hierarchical clustering
TLDR
An effective ensemble algorithm for combining the results of hierarchical clustering of multiple datasets that can handle multiple contextually related heterogeneous datasets that use different feature sets, but consist of non-disjoint sets of objects.
Multi-view clustering of web documents using multi-objective genetic algorithm
TLDR
A new clustering method is presented that exploits multiple views to generate different clustering solutions and then selects a combination of clusters to form a final clustering solution based on Nondominated Sorting Genetic Algorithm (NSGA-II), which is a multi-objective optimization approach.
Coupled Clustering Ensemble by Exploring Data Interdependence
TLDR
A new coupled clustering ensemble (CCE) framework is proposed that can effectively capture the implicit interdependence relationships among base clusterings and among objects with higher clustering accuracy, stability, and robustness compared to 14 state-of-the-art techniques, supported by statistical analysis.
Cluster ensemble extraction for knowledge reuse framework
TLDR
A new approach, cluster ensemble extraction, as a knowledge reuse framework to create a new diversity without accessing the raw data, which creates a new set of clusterings from the existing clusterings, which have more diversity and size compared to base clusterings.
CONSENSUS-BASED ENSEMBLES OF SOFT CLUSTERINGS
TLDR
Experimental results over a variety of real-life datasets are provided to show that using soft clusterings as input does offer significant advantages, especially when dealing with vertically partitioned data.
Weighted partition consensus via kernels
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 70 REFERENCES
A Consensus Framework for Integrating Distributed Clusterings Under Limited Knowledge Sharing
TLDR
Three effective and efficient techniques for obtaining high-quality consensus functions are described and studied empirically for the following qualitatively different application scenarios: Where the original clusters were formed based on non-identical sets of features, where the original clustering algorithms were applied to non- identically sets of objects and when the individual solutions provide varying numbers of clusters.
Cluster ensembles: a knowledge reuse framework for combining partitionings
TLDR
This contribution is to formally define the cluster ensemble problem as an optimization problem and to propose three effective and efficient combiners for solving it based on a hypergraph model.
Iterative Optimization and Simplification of Hierarchical Clusterings
  • D. Fisher
  • Computer Science
    J. Artif. Intell. Res.
  • 1996
TLDR
This work evaluates an inexpensive strategy for creating initial clusterings, coupled with several control strategies for iterative optimization, each of which repeatedly modifies an initial clustering in search of a better one.
Chameleon: Hierarchical Clustering Using Dynamic Modeling
TLDR
Chameleon's key feature is that it accounts for both interconnectivity and closeness in identifying the most similar pair of clusters, which is important for dealing with highly variable clusters.
Partitioning-based clustering for Web document categorization
Clustering and isolation in the consensus problem for partitions
We examine the problem of aggregating several partitions of a finite set into a single consensus partition We note that the dual concepts of clustering and isolation are especially significant in
Data clustering using evidence accumulation
  • A. Fred, Anil K. Jain
  • Computer Science
    Object recognition supported by user interaction for service robots
  • 2002
TLDR
Results on both synthetic and real data show the ability of the K-means method to identify arbitrary shaped clusters in multidimensional data.
Refining Initial Points for K-Means Clustering
TLDR
A procedure for computing a refined starting condition from a given initial one that is based on an efficient technique for estimating the modes of a distribution that allows the iterative algorithm to converge to a “better” local minimum.
Impact of Similarity Measures on Web-page Clustering
TLDR
Comparing four popular similarity measures in conjunction with several clustering techniques, cosine and extended Jaccard similarities emerge as the best measures to capture human categorization behavior, while Euclidean performs poorest.
Collective, Hierarchical Clustering from Distributed, Heterogeneous Data
TLDR
This paper presents the Collective Hierarchical Clustering (CHC) algorithm, which first generates local cluster models and then combines them to generate the global cluster model of the data, and shows significant improvement over naive methods with O(n2) communication costs.
...
1
2
3
4
5
...