Parsimonious mixtures of multivariate contaminated normal distributions

  title={Parsimonious mixtures of multivariate contaminated normal distributions},
  author={Antonio Punzo and Paul D. McNicholas},
  journal={Biometrical Journal},
A mixture of multivariate contaminated normal distributions is developed for model‐based clustering. In addition to the parameters of the classical normal mixture, our contaminated mixture has, for each cluster, a parameter controlling the proportion of mild outliers and one specifying the degree of contamination. Crucially, these parameters do not have to be specified a priori, adding a flexibility to our approach. Parsimony is introduced via eigen‐decomposition of the component covariance… 
ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions
We introduce the R package ContaminatedMixt, conceived to disseminate the use of mixtures of multivariate contaminated normal distributions as a tool for robust clustering and classification under
Model-based clustering via new parsimonious mixtures of heavy-tailed distributions
Two families of parsimonious mixture models are introduced for model-based clustering. They are based on two multivariate distributions-the shifted exponential normal and the tail-inflated
Mixtures of multivariate contaminated normal regression models
Mixtures of regression models (MRMs) are widely used to investigate the relationship between variables coming from several unknown latent homogeneous groups. Usually, the conditional distribution of
Multiple scaled contaminated normal distribution and its application in clustering
The multivariate contaminated normal (MCN) distribution represents a simple heavy-tailed generalization of the multivariate normal (MN) distribution to model elliptical contoured scatters in the
Mixtures of Matrix-Variate Contaminated Normal Distributions
The matrix-variate contaminated normal distribution is discussed and then utilized in the mixture model paradigm for clustering, with a key advantage of the proposed model is the ability to automatically detect potential outlying matrices by computing their a posteriori probability of being typical or atypical.
Mixtures of Contaminated Matrix Variate Normal Distributions
The contaminated matrix variate normal distribution is discussed and then utilized in the mixture model paradigm for clustering, with a key advantage of the proposed model is the ability to automatically detect potential outlying matrices by computing their probability to be a "good" or "bad" point.
Maximum a-posteriori estimation of autoregressive processes based on finite mixtures of scale-mixtures of skew-normal distributions
ABSTRACT This article investigates maximum a-posteriori (MAP) estimation of autoregressive model parameters when the innovations (errors) follow a finite mixture of distributions that, in turn, are
Gaussian parsimonious clustering models with covariates and a noise component
This paper addresses the equivalent aims of including covariates in Gaussian Parsimonious clustering models and incorporating parsimonious covariance structures into all special cases of the Gaussian mixture of experts framework.


Constrained monotone EM algorithms for mixtures of multivariate t distributions
A constrained monotone algorithm implementing maximum likelihood mixture decomposition of multivariate t distributions is proposed, to achieve improved convergence capabilities and robustness.
Robust Cluster Analysis via Mixtures of Multivariate t-Distributions
The expectation-maximization (EM) algorithm can be used to fit mixtures of multivariate t-distributions by maximum likelihood and it is demonstrated how the use of t-components provides less extreme estimates of the posterior probabilities of cluster membership.
Robust mixture modelling using the t distribution
The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.
Breakdown points for maximum likelihood estimators of location–scale mixtures
It turns out that the two alternatives, while adding stability in the presence of outliers of moderate size, do not possess a substantially better breakdown behavior than estimation based on Normal mixtures.
Hypothesis Testing for Mixture Model Selection
ABSTRACT Gaussian mixture models with eigen-decomposed covariance structures, i.e. the Gaussian parsimonious clustering models (GPCM), make up the most popular family of mixture models for clustering
Extending mixtures of multivariate t-factor analyzers
The extension of the mixtures of multivariate t-factor analyzers model is described to include constraints on the degrees of freedom, the factor loadings, and the error variance matrices to create a family of six mixture models, including parsimonious models.
Constrained Optimization for a Subset of the Gaussian Parsimonious Clustering Models
A subset of the GPCM family is considered for model-based clustering, where a re-parameterized version of the famous eigenvalue decomposition of the component covariance matrices is used.
A likelihood ratio test of a homoscedastic normal mixture against a heteroscedastic normal mixture
  • Y. Lo
  • Mathematics
    Stat. Comput.
  • 2008
Simulations show that for small and medium sample sizes, parametric bootstrap tests appear to work well for determining whether data arise from a normal mixture with equal variances or anormal mixture with unequal variances.
Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions
A novel family of mixture models wherein each component is modeled using a multivariate t-distribution with an eigen-decomposed covariance structure is put forth, known as the tEIGEN family.
Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood
An assessing method of mixture model in a cluster analysis setting with integrated completed likelihood appears to be more robust to violation of some of the mixture model assumptions and it can select a number of dusters leading to a sensible partitioning of the data.