Clustering strategy and method selection

@article{Hennig2015ClusteringSA,
  title={Clustering strategy and method selection},
  author={Christian Hennig},
  journal={arXiv: Methodology},
  year={2015}
}
  • C. Hennig
  • Published 6 March 2015
  • Computer Science
  • arXiv: Methodology
Note: This paper is a chapter in the forthcoming Handbook of Cluster Analysis, Hennig et al. (2015). For definitions of basic clustering methods and some further methodology, other chapters of the Handbook are referred to. To read this version of the paper without the Handbook, some knowledge of cluster analysis methodology is required. The aim of this chapter is to provide a framework for all the decisions that are required when carrying out a cluster analysis in practice. A general attitude… 

An empirical comparison and characterisation of nine popular clustering methods

  • C. Hennig
  • Computer Science
    Adv. Data Anal. Classif.
  • 2022
The study gives new insight into the ability of the methods to discover “true” clusterings, but also into properties of clusterings that can be expected from the methods, which is crucial for the choice of a method in a real situation without a given “ true” clustering.

An empirical comparison and characterisation of nine popular clustering methods

  • C. Hennig
  • Computer Science
    Advances in Data Analysis and Classification
  • 2022
The study gives new insight into the ability of the methods to discover “true” clusterings, but also into properties of clusterings that can be expected from the methods, which is crucial for the choice of a method in a real situation without a given “ true” clustering.

Cluster Validation by Measurement of Clustering Characteristics Relevant to the User

  • C. Hennig
  • Computer Science
    Data Analysis and Applications 1
  • 2019
A focus of the paper is on methodology to standardise the different characteristics of a clustering so that users can aggregate them in a suitable way specifying weights for the various criteria that are relevant in the clustering application at hand.

Benchmarking in cluster analysis: A white paper

To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

A set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature, and two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

A set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature, and two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.

Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score

This paper develops two cluster-quality criteria that are consistent with groups generated from a class of elliptic-symmetric distributions and proposes a selection rule that allows choosing among many clustering solutions, eventually obtained from different methods.

Latent Class Cluster Analysis: Selecting the number of clusters

Validation of cluster analysis results on validation data: A systematic framework

This work outlines a formal framework that covers most existing approaches for validating clustering results on validation data, and reviews classical validation techniques such as internal and external validation, stability analysis, and visual validation, and shows how they can be interpreted in terms of this framework.

Customer Choice Modelling: A Multi-Level Consensus Clustering Approach

This work presents a Multi-level Consensus Clustering approach combining the results of several clustering algorithmic configurations to generate a hierarchy of consensus clusters in which each cluster represents an agreement between different clustering results.

References

SHOWING 1-10 OF 65 REFERENCES

Handbook of Cluster Analysis

Handbook of Cluster Analysis provides a comprehensive and unified account of the main research developments in cluster analysis. Written by active, distinguished researchers in this area, the book

Clustering: Science or Art?

It is argued that it will be useful to build a "taxonomy of clustering problems" to identify clustering applications which can be treated in a unified way and that such an effort will be more fruitful than attempting the impossible--developing "optimal" domain-independent clustering algorithms or even classifying clusteringgorithms in terms of how they work.

An indication of unification for different clustering approaches

A study of standardization of variables in cluster analysis

The present simulation study examined the standardization problem and found that those approaches which standardize by division by the range of the variable gave consistently superior recovery of the underlying cluster structure.

Towards Property-Based Classification of Clustering Paradigms

It is demonstrated how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms, and Kleinberg's famous impossibility result is strengthened.

A Method for Visual Cluster Validation

A methodology to explore two aspects of a cluster found by any cluster analysis method: the cluster should be separated from the rest of the data, and the points of the clusters should not split up into further separated subclasses.

Clusterability: A Theoretical Study

This work addresses measures of the clusterability of data sets with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specic data generation model, as well as proposing a new notion of data clusterability.

Dissimilarity Plots: A Visual Exploration Tool for Partitional Clustering

(dissimilarity) matrix shading is extended with several reordering steps based on seriation techniques and it is shown that dissimilarity plots scale very well with increasing data dimensionality.

A theory of proximity based clustering: structure detection by optimization

Measures of Clustering Quality: A Working Set of Axioms for Clustering

It is shown that principles like those formulated in Kleinberg's axioms can be readily expressed in the latter framework without leading to inconsistency, and several natural clustering quality measures are proposed, all satisfying the proposedAxioms.
...