Clustering strategy and method selection
@article{Hennig2015ClusteringSA, title={Clustering strategy and method selection}, author={Christian Hennig}, journal={arXiv: Methodology}, year={2015} }
Note: This paper is a chapter in the forthcoming Handbook of Cluster Analysis, Hennig et al. (2015). For definitions of basic clustering methods and some further methodology, other chapters of the Handbook are referred to. To read this version of the paper without the Handbook, some knowledge of cluster analysis methodology is required. The aim of this chapter is to provide a framework for all the decisions that are required when carrying out a cluster analysis in practice. A general attitude…
47 Citations
An empirical comparison and characterisation of nine popular clustering methods
- Computer ScienceAdv. Data Anal. Classif.
- 2022
The study gives new insight into the ability of the methods to discover “true” clusterings, but also into properties of clusterings that can be expected from the methods, which is crucial for the choice of a method in a real situation without a given “ true” clustering.
An empirical comparison and characterisation of nine popular clustering methods
- Computer ScienceAdvances in Data Analysis and Classification
- 2022
The study gives new insight into the ability of the methods to discover “true” clusterings, but also into properties of clusterings that can be expected from the methods, which is crucial for the choice of a method in a real situation without a given “ true” clustering.
Cluster Validation by Measurement of Clustering Characteristics Relevant to the User
- Computer ScienceData Analysis and Applications 1
- 2019
A focus of the paper is on methodology to standardise the different characteristics of a clustering so that users can aggregate them in a suitable way specifying weights for the various criteria that are relevant in the clustering application at hand.
Benchmarking in cluster analysis: A white paper
- Computer Science
- 2018
To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data…
Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes
- Computer ScienceStat. Comput.
- 2020
A set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature, and two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.
Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes
- Computer ScienceStatistics and Computing
- 2020
A set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature, and two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.
Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score
- Computer ScienceArXiv
- 2021
This paper develops two cluster-quality criteria that are consistent with groups generated from a class of elliptic-symmetric distributions and proposes a selection rule that allows choosing among many clustering solutions, eventually obtained from different methods.
Validation of cluster analysis results on validation data: A systematic framework
- Computer ScienceWIREs Data Mining Knowl. Discov.
- 2022
This work outlines a formal framework that covers most existing approaches for validating clustering results on validation data, and reviews classical validation techniques such as internal and external validation, stability analysis, and visual validation, and shows how they can be interpreted in terms of this framework.
Customer Choice Modelling: A Multi-Level Consensus Clustering Approach
- Computer Science, Business
- 2021
This work presents a Multi-level Consensus Clustering approach combining the results of several clustering algorithmic configurations to generate a hierarchy of consensus clusters in which each cluster represents an agreement between different clustering results.
References
SHOWING 1-10 OF 65 REFERENCES
Handbook of Cluster Analysis
- Computer Science
- 2015
Handbook of Cluster Analysis provides a comprehensive and unified account of the main research developments in cluster analysis. Written by active, distinguished researchers in this area, the book…
Clustering: Science or Art?
- Computer ScienceICML Unsupervised and Transfer Learning
- 2012
It is argued that it will be useful to build a "taxonomy of clustering problems" to identify clustering applications which can be treated in a unified way and that such an effort will be more fruitful than attempting the impossible--developing "optimal" domain-independent clustering algorithms or even classifying clusteringgorithms in terms of how they work.
An indication of unification for different clustering approaches
- Computer SciencePattern Recognit.
- 2013
A study of standardization of variables in cluster analysis
- Computer Science
- 1988
The present simulation study examined the standardization problem and found that those approaches which standardize by division by the range of the variable gave consistently superior recovery of the underlying cluster structure.
Towards Property-Based Classification of Clustering Paradigms
- Computer ScienceNIPS
- 2010
It is demonstrated how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms, and Kleinberg's famous impossibility result is strengthened.
A Method for Visual Cluster Validation
- Computer ScienceGfKl
- 2004
A methodology to explore two aspects of a cluster found by any cluster analysis method: the cluster should be separated from the rest of the data, and the points of the clusters should not split up into further separated subclasses.
Clusterability: A Theoretical Study
- Computer ScienceAISTATS
- 2009
This work addresses measures of the clusterability of data sets with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specic data generation model, as well as proposing a new notion of data clusterability.
Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering
- Computer Science
- 2011
(dissimilarity) matrix shading is extended with several reordering steps based on seriation techniques and it is shown that dissimilarity plots scale very well with increasing data dimensionality.
A theory of proximity based clustering: structure detection by optimization
- Computer SciencePattern Recognit.
- 2000
Measures of Clustering Quality: A Working Set of Axioms for Clustering
- Computer ScienceNIPS
- 2008
It is shown that principles like those formulated in Kleinberg's axioms can be readily expressed in the latter framework without leading to inconsistency, and several natural clustering quality measures are proposed, all satisfying the proposedAxioms.