Handbook of Cluster Analysis
@inproceedings{Hennig2015HandbookOC, title={Handbook of Cluster Analysis}, author={Christian Hennig and Marina Meilă and Fionn Murtagh and Roberto Rocci}, year={2015} }
Handbook of Cluster Analysis provides a comprehensive and unified account of the main research developments in cluster analysis. [] Key Method These approaches include methods for optimizing an objective function that describes how well data is grouped around centroids, dissimilarity-based methods, mixture models and partitioning models, and clustering methods inspired by nonparametric density estimation.
Figures from this paper
330 Citations
Clustering strategy and method selection
- Computer Science
- 2015
The aim of this chapter is to provide a framework for all the decisions that are required when carrying out a cluster analysis in practice, and a general attitude to clustering is outlined, which connects these decisions closely to the clustering aims in a given application.
An empirical comparison and characterisation of nine popular clustering methods
- Computer ScienceAdv. Data Anal. Classif.
- 2022
The study gives new insight into the ability of the methods to discover “true” clusterings, but also into properties of clusterings that can be expected from the methods, which is crucial for the choice of a method in a real situation without a given “ true” clustering.
An empirical comparison and characterisation of nine popular clustering methods
- Computer ScienceAdvances in Data Analysis and Classification
- 2022
The study gives new insight into the ability of the methods to discover “true” clusterings, but also into properties of clusterings that can be expected from the methods, which is crucial for the choice of a method in a real situation without a given “ true” clustering.
Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes
- Computer ScienceStat. Comput.
- 2020
A set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature, and two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.
Distance‐based clustering of mixed data
- Computer ScienceWIREs Computational Statistics
- 2018
Three different streams that range from basic data preprocessing (where all variables are converted to the same scale), to the use of specific distance measures for mixed data, and finally to so‐called joint data reduction (a combination of dimension reduction and clustering) methods specifically designed for Mixed data are distinguished.
Nomclust 2.0: an R package for hierarchical clustering of objects characterized by nominal variables
- Computer ScienceComputational Statistics
- 2022
The second generation of the nomclust R package is completely rewritten to be more natural for the workflow of R users, and includes new similarity measures and evaluation criteria.
Unsupervised Learning
- Computer ScienceEncyclopedia of GIS
- 2017
This article presents a review of traditional and current methods of classification in the framework of unsupervised learning, in particular cluster analysis and self-organizing neural networks, aiming at minimizing the distance between an input vector and its representation.
The Exploitation of Distance Distributions for Clustering
- Computer ScienceInt. J. Comput. Intell. Appl.
- 2021
It is shown that multimodal distance distributions are preferable in cluster analysis, and it is advantageous to model distance distributions with Gaussian mixtures prior to the evaluation phase of unsupervised methods.
Validation of cluster analysis results on validation data: A systematic framework
- Computer ScienceWIREs Data Mining Knowl. Discov.
- 2022
This work outlines a formal framework that covers most existing approaches for validating clustering results on validation data, and reviews classical validation techniques such as internal and external validation, stability analysis, and visual validation, and shows how they can be interpreted in terms of this framework.
Selection of the number of clusters in functional data analysis
- Computer ScienceJournal of Statistical Computation and Simulation
- 2022
The main idea is to use a combination of two test statistics, which measure the lack of parallelism and the mean distance between curves, to compute criteria such as the within and between cluster sum of squares.
References
SHOWING 1-10 OF 52 REFERENCES
K-means clustering via principal component analysis
- Computer ScienceICML
- 2004
It is proved that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering, which indicates that unsupervised dimension reduction is closely related to unsuper supervised learning.
A Spectral Clustering Approach To Finding Communities in Graph
- Computer ScienceSDM
- 2005
This paper shows how optimizing the Q function can be reformulated as a spectral relaxation problem and proposes two new spectral clustering algorithms that seek to maximize Q and indicates that the new algorithms are efficient and effective at finding both good clusterings and the appropriate number of clusters across a variety of real-world graph data sets.
Finding Low Error Clusterings
- Computer ScienceCOLT
- 2009
New and more subtle structural properties for min-sum are derived in this context and used to design efficient algorithms for producing accurate clusterings, both in the transductive and in the inductive case.
Influence of graph construction on graph-based clustering measures
- Computer ScienceNIPS
- 2008
This paper studies the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity and finds that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph.
Regularized spectral learning
- Computer ScienceAISTATS
- 2005
A new objective for learning in spectral clustering is formulates, that balances a clustering accuracy term, the gap, and a stabilityterm, the eigengap with the later in the role of a regularizer, and an algorithm to optimize this objective is derived.
Clusterability: A Theoretical Study
- Computer ScienceAISTATS
- 2009
This work addresses measures of the clusterability of data sets with generality, aiming for conclusions that apply regardless of any particular clustering algorithm or any specic data generation model, as well as proposing a new notion of data clusterability.
Spectral Methods for Automatic Multiscale Data Clustering
- Computer Science2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)
- 2006
This paper provides new insights into how the method works and uses these to derive new algorithms which given the data alone automatically learn different plausible data partitionings.
Fast Large-Scale Spectral Clustering by Sequential Shrinkage Optimization
- Computer ScienceECIR
- 2007
This paper proposes a fast solver for spectral clustering that sequentially decides the labels of relatively well-separated data points and can achieve significant improvement in speed as compared to traditional spectral clustered algorithms.
On Spectral Clustering: Analysis and an algorithm
- Computer ScienceNIPS
- 2001
A simple spectral clustering algorithm that can be implemented using a few lines of Matlab is presented, and tools from matrix perturbation theory are used to analyze the algorithm, and give conditions under which it can be expected to do well.
Active spectral clustering via iterative uncertainty reduction
- Computer ScienceKDD
- 2012
An active learning algorithm is proposed that incrementally measures only those similarities that are most likely to remove uncertainty in an intermediate clustering solution, and shows a significant improvement in performance compared to the alternatives.