Model-Based Clustering, Discriminant Analysis, and Density Estimation
@article{Fraley2002ModelBasedCD, title={Model-Based Clustering, Discriminant Analysis, and Density Estimation}, author={Chris Fraley and Adrian E. Raftery}, journal={Journal of the American Statistical Association}, year={2002}, volume={97}, pages={611 - 631} }
Cluster analysis is the automated search for groups of related observations in a dataset. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures, and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be…
4,095 Citations
Methods for Clustering Data with Missing Values
- Computer Science
- 2016
An algorithm that utilises marginal multivariate Gaussian densities for assignment probabilities, was developed and tested versus more conventional ways of model-based clustering for incomplete data and found that for cases with many observations, the complete case and multiple imputation have advantages over the marginal density method.
Recent Developments in Model-Based Clustering with Applications
- Computer Science
- 2015
The latest developments in model-based clustering including semi-supervised clustering, non-parametric mixture modeling, choice of initialization strategies, merging mixture components for clusters, handling spurious solutions, and assessing variability of obtained partitions are reviewed.
A generalized Bayes framework for probabilistic clustering
- Computer ScienceBiometrika
- 2023
A generalized Bayes framework is proposed that bridges between these paradigms through the use of Gibbs posteriors, and provides a method of uncertainty quantification for these approaches; for example, allowing calculation of the probability a data point is well clustered.
Model-Based Clustering With Dissimilarities: A Bayesian Approach
- Computer Science
- 2007
The method carries out multidimensional scaling and model-based clustering simultaneously, and yields good object configurations and good clustering results with reasonable measures of clustering uncertainties, and can be used as a tool for dimension reduction when clustering high-dimensional objects.
Robust EM algorithm for model-based curve clustering
- Computer ScienceThe 2013 International Joint Conference on Neural Networks (IJCNN)
- 2013
The approach both handles the problem of initialization and the one of choosing the optimal number of clusters as the EM learning proceeds, rather than in a twofold scheme, by optimizing a penalized log-likelihood criterion.
A Population Background for Nonparametric Density-Based Clustering
- Computer Science
- 2014
It is shown that only mild conditions on a sequence of density estimators are needed to ensure that the sequence of modal clusterings that they induce is consistent and two new loss functions are presented, applicable in fact to any clustering methodology, to evaluate the performance of a data-based clustering algorithm with respect to the ideal population goal.
Model-based Clustering with Dissimilarities : A Bayesian Approach 1
- Computer Science
- 2003
The method carries out multidimensional scaling and model-based clustering simultaneously, and yields good object configurations and good clustering results with reasonable measures of clustering uncertainties, and can be used as a tool for dimension reduction when clustering high-dimensional objects.
Fast clustering using adaptive density peak detection
- Computer ScienceStatistical methods in medical research
- 2017
This paper proposes a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation and develops an automatic cluster centroid selection method through maximizing an average silhouette index.
A Bayesian Predictive Model for Clustering Data of Mixed Discrete and Continuous Type
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2015
This paper introduces a model-based approach for clustering feature vectors of mixed type, allowing each feature to simultaneously take on both categorical and real values.
References
SHOWING 1-10 OF 150 REFERENCES
Model-based clustering and data transformations for gene expression data
- Computer ScienceBioinform.
- 2001
The model-based approach has superior performance on synthetic data sets, consistently selecting the correct model and the number of clusters, and the validity of the Gaussian mixture assumption on different transformations of real data is explored.
How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis
- Computer ScienceComput. J.
- 1998
The problems of determining the number of clusters and the clustering method are solved simultaneously by choosing the best model, and the EM result provides a measure of uncertainty about the associated classification of each data point.
Very Fast EM-Based Mixture Model Clustering Using Multiresolution Kd-Trees
- Computer ScienceNIPS
- 1998
A new algorithm is presented, based on the multiresolution kd-trees of [5], which dramatically reduces the cost of EM-based clustering, with savings rising linearly with the number of datapoints.
Hierarchical Model-Based Clustering for Large Datasets
- Computer Science
- 2001
This article proposes to start the hierarchical agglomeration from an efficient classification of the data in many classes rather than from the usual set of singleton clusters, and develops graphical tools that assess the presence of clusters in the data and uncover observations difficult to classify.
Inference in model-based cluster analysis
- Computer ScienceStat. Comput.
- 1997
This work proposes a new approach to cluster analysis which consists of exact Bayesian inference via Gibbs sampling, and the calculation of Bayes factors from the output using the Laplace–Metropolis estimator, which works well in several real and simulated examples.
Finding Curvilinear Features in Spatial Point Patterns: Principal Curve Clustering with Noise
- Computer ScienceIEEE Trans. Pattern Anal. Mach. Intell.
- 2000
The algorithm for principal curve clustering is in two steps: the first is hierarchical and agglomerative (HPCC) and the second consists of iterative relocation based on the classification EM algorithm (CEM-PCC), which is used to combine potential feature clusters and refines the results and deals with background noise.
Model selection for probabilistic clustering using cross-validated likelihood
- Computer ScienceStat. Comput.
- 2000
The cross-validation approach, as well as penalized likelihood and McLachlan's bootstrap method, are applied to two data sets and the results from all three methods are in close agreement.
Principal component analysis for clustering gene expression data
- Computer ScienceBioinform.
- 2001
The empirical study showed that clustering with the PCs instead of the original variables does not necessarily improve, and often degrades, cluster quality, and would not recommend PCA before clustering except in special circumstances.