• Corpus ID: 228083390

Cluster analysis and outlier detection with missing data

  title={Cluster analysis and outlier detection with missing data},
  author={Hung Tong and Cristina Tortora},
  journal={arXiv: Methodology},
  • H. Tong, C. Tortora
  • Published 10 December 2020
  • Computer Science, Mathematics
  • arXiv: Methodology
A mixture of multivariate contaminated normal (MCN) distributions is a useful model-based clustering technique to accommodate data sets with mild outliers. However, this model only works when fitted to complete data sets, which is often not the case in real applications. In this paper, we develop a framework for fitting a mixture of MCN distributions to incomplete data sets, i.e. data sets with some values missing at random. We employ the expectation-conditional maximization algorithm for… 


Parsimonious mixtures of multivariate contaminated normal distributions
A mixture of multivariate contaminated normal distributions is developed for model‐based clustering and Parsimony is introduced via eigen‐decomposition of the component covariance matrices, and sufficient conditions for the identifiability of all the members of the resulting family are provided.
Robust Cluster Analysis and Variable Selection
Introduction Mixture and classification models and their likelihood estimators General consistency and asymptotic normality Local likelihood estimates Maximum likelihood estimates Notes Mixture
Maximum likelihood estimation via the ECM algorithm: A general framework
Two major reasons for the popularity of the EM algorithm are that its maximum step involves only complete-data maximum likelihood estimation, which is often computationally simple, and that its
MICE: Multivariate Imputation by Chained Equations in R
Mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs.
Mixture Model-Based Classification
Robust mixture modelling using multivariate t-dist. with missing information
  • Pat. Rec. Let.,
  • 2004
Maximum Likelihood Estimation via the ECM Algorithm: A
  • General Framework. Biometrika,
  • 1993