An empirical comparison and characterisation of nine popular clustering methods

@article{Hennig2021AnEC,
  title={An empirical comparison and characterisation of nine popular clustering methods},
  author={Christian Hennig},
  journal={Advances in Data Analysis and Classification},
  year={2021},
  volume={16},
  pages={201-229}
}
  • C. Hennig
  • Published 6 February 2021
  • Computer Science
  • Advances in Data Analysis and Classification
Nine popular clustering methods are applied to 42 real data sets. The aim is to give a detailed characterisation of the methods by means of several cluster validation indexes that measure various individual aspects of the resulting clusters such as small within-cluster distances, separation of clusters, closeness to a Gaussian distribution etc. as introduced in Hennig (in: Data analysis and applications 1: clustering and regression, modeling—estimating, forecasting and data mining, ISTE Ltd… 
2 Citations

Benchmarking distance-based partitioning methods for mixed-type data

A benchmarking study comparing eight distance-based partitioning methods for mixed-type data in terms of cluster recovery performance, finding that KAMILA, K-Prototypes and sequential Factor Analysis and K-Means clustering typically performed better than other methods.

Over-optimistic evaluation and reporting of novel cluster algorithms: an illustrative study

An illustrative study to illuminate the mechanisms by which authors—consciously or unconsciously—paint their cluster algorithm’s performance in an over-optimistic light, and illustrates how easy it can be for researchers to claim apparent “superiority” of a new cluster algorithm.

References

SHOWING 1-10 OF 96 REFERENCES

Cluster Validation by Measurement of Clustering Characteristics Relevant to the User

  • C. Hennig
  • Computer Science
    Data Analysis and Applications 1
  • 2019
A focus of the paper is on methodology to standardise the different characteristics of a clustering so that users can aggregate them in a suitable way specifying weights for the various criteria that are relevant in the clustering application at hand.

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

A set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature, and two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.

Clustering strategy and method selection

The aim of this chapter is to provide a framework for all the decisions that are required when carrying out a cluster analysis in practice, and a general attitude to clustering is outlined, which connects these decisions closely to the clustering aims in a given application.

Landscape of clustering algorithms

This work empirically study the similarity of clustering solutions obtained by many traditional as well as relatively recent clustering algorithms on a number of real-world data sets and finds that only a small number of clustered algorithms are sufficient to represent a large spectrum of clusters criteria.

Handbook of Cluster Analysis

Handbook of Cluster Analysis provides a comprehensive and unified account of the main research developments in cluster analysis. Written by active, distinguished researchers in this area, the book

Model-Based Clustering, Discriminant Analysis, and Density Estimation

This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.

Performance Evaluation of Some Clustering Algorithms and Validity Indices

In this article, we evaluate the performance of three clustering algorithms, hard K-Means, single linkage, and a simulated annealing (SA) based technique, in conjunction with four cluster validity

Comparing clusterings---an information based distance

Evaluating mixture modeling for clustering: recommendations and cautions.

Focus is given to the multivariate normal distribution, and 9 separate decompositions (i.e., class structures) of the covariance matrix are investigated, and degraded performance was observed for both K-means clustering and mixture-model clusters.

Towards Property-Based Classification of Clustering Paradigms

It is demonstrated how abstract, intuitive properties of clustering functions can be used to taxonomize a set of popular clustering algorithmic paradigms, and Kleinberg's famous impossibility result is strengthened.
...