• Corpus ID: 123709149

MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering †

@inproceedings{Fraley2007MCLUSTV3,
  title={MCLUST Version 3 for R: Normal Mixture Modeling and Model-Based Clustering †},
  author={Chris Fraley and Adrian E. Raftery},
  year={2007}
}
MCLUST is a contributed R package for normal mixture modeling and model-based clustering. It provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simulation from these models. Also included are functions that combine model-based hierarchical clustering, EM for mixture estimation and the Bayesian Information Criterion (BIC) in comprehensive strategies for clustering, density estimation and… 
Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering
TLDR
A modified version of BIC is proposed, where the likelihood is evaluated at the MAP instead of the MLE, and the resulting method avoids degeneracies and singularities, but when these are not present it gives similar results to the standard method using MLE.
Mixture model averaging for clustering
TLDR
This work average multiple models that are in some sense close to the best one, thereby producing a weighted average of clustering results, and introduces a method for merging mixture components based on the adjusted Rand index.
Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution
TLDR
A new class of distributions, multivariate t distributions with the Box-Cox transformation, is proposed for mixture modeling, which provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues.
Mixture model selection via hierarchical BIC
Using conditional independence for parsimonious model-based Gaussian clustering
TLDR
Novel models in which constraints on the component-specific variance matrices allow us to define Gaussian parsimonious clustering models are proposed, obtained by assuming that the variables can be partitioned into groups resulting to be conditionally independent within components, thus producing component- specific varianceMatrices with a block diagonal structure.
Genetic Algorithms for Subset Selection in Model-Based Clustering
TLDR
The problem of subset selection is recast as a model comparison problem, and BIC is used to approximate Bayes factors, and the criterion proposed is based on the BIC difference between a candidate clustering model for the given subset and a model which assumes no clustering for the same subset.
Cluster Analysis, Model Selection, and Prior Distributions on Models
TLDR
A product partition model and a model selection procedure based on Bayes factors from intrinsic priors are developed and it is found that a new prior, the hierarchical uniform prior leads to consistent model selection procedures and has other desirable properties.
Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions
TLDR
A novel family of mixture models wherein each component is modeled using a multivariate t-distribution with an eigen-decomposed covariance structure is put forth, known as the tEIGEN family.
Model‐based clustering of longitudinal data
TLDR
A new family of mixture models for the model‐based clustering of longitudinal data is introduced and the covariance structures of eight members are given and the associated maximum likelihood estimates for the parameters are derived via expectation–maximization (EM) algorithms.
...
...

References

SHOWING 1-10 OF 38 REFERENCES
Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST
MCLUST is a software package for model-based clustering, density estimation and discriminant analysis interfaced to the S-PLUS commercial software and the R language. It implements parameterized
Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering
TLDR
A modified version of BIC is proposed, where the likelihood is evaluated at the MAP instead of the MLE, and the resulting method avoids degeneracies and singularities, but when these are not present it gives similar results to the standard method using MLE.
Model-based Gaussian and non-Gaussian clustering
TLDR
The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of Friedman and Rubin (1967), but it is restricted to Gaussian distributions and it does not allow for noise.
Model-Based Clustering, Discriminant Analysis, and Density Estimation
TLDR
This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.
How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis
TLDR
The problems of determining the number of clusters and the clustering method are solved simultaneously by choosing the best model, and the EM result provides a measure of uncertainty about the associated classification of each data point.
Gaussian parsimonious clustering models
Model-based Methods of Classification: Using the mclust Software in Chemometrics
Due to recent advances in methods and software for model-based clustering, and to the interpretability of the results, clustering procedures based on probability models are increasingly preferred
Incremental Model-Based Clustering for Large Datasets With Small Clusters
TLDR
An incremental approach for data that can be processed as a whole in memory is proposed, which is relatively efficient computationally and has the ability to find small clusters in large datasets.
Model-Based Clustering for Image Segmentation and Large Datasets via Sampling
TLDR
These experiments suggest that a stable method with better performance can be obtained with two straightforward modifications to the simple sampling method: several tentative models are identified from the sample instead of just one, and several EM steps are used rather than just one E step to classify the full data set.
Detecting features in spatial point processes with clutter via model-based clustering
Abstract We consider the problem of detecting features, such as minefields or seismic faults, in spatial point processes when there is substantial clutter. We use model-based clustering based on a
...
...