# How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis

@article{Fraley1998HowMC, title={How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis}, author={Chris Fraley and Adrian E. Raftery}, journal={Comput. J.}, year={1998}, volume={41}, pages={578-588} }

We consider the problem of determining the structure of clustered data, without prior knowledge of the number of clusters or any other information about their composition. Data are represented by a mixture model in which each component corresponds to a different cluster. Models with varying geometric properties are obtained through Gaussian components with different parametrizations and cross-cluster constraints. Noise and outliers can be modelled by adding a Poisson process component…

## 2,597 Citations

### Genetic Algorithms for Subset Selection in Model-Based Clustering

- Computer Science
- 2016

The problem of subset selection is recast as a model comparison problem, and BIC is used to approximate Bayes factors, and the criterion proposed is based on the BIC difference between a candidate clustering model for the given subset and a model which assumes no clustering for the same subset.

### Bayesian estimation of membership uncertainty in model‐based clustering

- Computer Science
- 2014

It is demonstrated that model‐based clustering gives much better performance for overlapping clusters, a more reliable determination of the number of clusters in data, and better identification of clustering in the presence of outliers than agglomerative hierarchical clustering or iterative relocation clustering using a K‐means criterion.

### ON CLUSTER ANALYSIS A BAYESIAN AND MODEL-BASED APPROACH

- Computer Science
- 2006

A model-based approach to cluster analysis is presented, as opposed to the mechanical classi
cation used in deterministic clustering, which regard observations as outcomes of di¤erent distributions.

### Methods for Clustering Data with Missing Values

- Computer Science
- 2016

An algorithm that utilises marginal multivariate Gaussian densities for assignment probabilities, was developed and tested versus more conventional ways of model-based clustering for incomplete data and found that for cases with many observations, the complete case and multiple imputation have advantages over the marginal density method.

### Assessment and pruning of hierarchical model based clustering

- Computer ScienceKDD '03
- 2003

A new clustering method is proposed that can be regarded as a hybrid between model-based and nonparametric clustering, and the hybrid clustering algorithm prunes the cluster tree generated by hierarchical model- based clustering.

### Combining Mixture Components for Clustering

- Computer ScienceJournal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America
- 2010

This paper proposes first selecting the total number of Gaussian mixture components, K, using BIC and then combining them hierarchically according to an entropy criterion, which yields a unique soft clustering for each number of clusters less than or equal to K.

### clusterBMA: Bayesian model averaging for clustering

- Computer Science
- 2022

Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble and consensus clustering literature. The approach of…

### On Comparing the Clustering of Regression Models Method with K-means Clustering

- Computer Science
- 2007

It is shown that the two clustering methods, CORM and K-means, can both be considered as solutions to a least squares problem with missing data but they each concern a different type of least squares.

### Integrated classification likelihood for model selection in block clustering

- Computer Science
- 2012

A criterion based on an approximation of the integrated classification likelihood (ICL) of block models is developed, and a BIC-like criterion derived from the form obtained is proposed.

### Methods of Determining the Number of Clusters in a Data Set and a New Clustering Criterion

- Computer Science
- 2005

This newly defined clustering method is aimed at overcoming the so-called " equal-size " problem associated with the k-means method, while maintaining its advantage of computational simplicity.

## References

SHOWING 1-10 OF 71 REFERENCES

### Inference in model-based cluster analysis

- Computer ScienceStat. Comput.
- 1997

This work proposes a new approach to cluster analysis which consists of exact Bayesian inference via Gibbs sampling, and the calculation of Bayes factors from the output using the Laplace–Metropolis estimator, which works well in several real and simulated examples.

### Robust Cluster Analysis via Mixtures of Multivariate t-Distributions

- Mathematics, Computer ScienceSSPR/SPR
- 1998

The expectation-maximization (EM) algorithm can be used to fit mixtures of multivariate t-distributions by maximum likelihood and it is demonstrated how the use of t-components provides less extreme estimates of the posterior probabilities of cluster membership.

### Model-based Gaussian and non-Gaussian clustering

- Computer Science
- 1993

The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of Friedman and Rubin (1967), but it is restricted to Gaussian distributions and it does not allow for noise.

### Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix

- Computer Science
- 1993

The informational complexity (ICOMP) criterion of IFIM of this author is derived and proposed as a new criterion for choosing the number of clusters in the mixture-model and the significance of ICOMP is illustrated.

### Algorithms for Model-Based Gaussian Hierarchical Clustering

- Computer ScienceSIAM J. Sci. Comput.
- 1998

It is shown how the structure of the Gaussian model can be exploited to yield efficient algorithms for agglomerative hierarchical clustering.

### Principal Curve Clustering With Noise

- Computer Science
- 1997

The algorithm for principal curve clustering is in two steps hierarchical and agglomerative HPCC and the second consists of iterative relocation based on the Clas si cation EM algorithm.

### 9 The classification and mixture maximum likelihood approaches to cluster analysis

- MathematicsClassification, Pattern Recognition and Reduction of Dimensionality
- 1982

### Autoclass — A Bayesian Approach to Classification

- Computer Science, Mathematics
- 1996

A Bayesian approach to the unsupervised discovery of classes in a set of cases, sometimes called finite mixture separation or clustering, which allows direct comparison of alternate density functions that differ in number of classes and/or individual class density functions.