Variable selection methods for model-based clustering

@article{Fop2018VariableSM,
  title={Variable selection methods for model-based clustering},
  author={Michael Fop and Thomas Brendan Murphy},
  journal={Statistics Surveys},
  year={2018},
  volume={12},
  pages={18-65}
}
Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to deal with the increasing dimensionality. In particular, the development of variable selection techniques has received a lot of attention and research effort in recent years. Even for small size problems, variable selection has been advocated to facilitate the… 
Variable diagnostics in model-based clustering through variation partition
TLDR
A new method through fuzzy variation decomposition is proposed for probabilistic assessing contribution of variables to a detected dataset partition and is employed on real-life datasets with promising results.
VarSelLCM: an R/C++ package for variable selection in model-based clustering of mixed-data with missing values
TLDR
VarSelLCM allows a full model selection in model-based clustering, according to classical information criteria, and allows data imputation by using mixture models.
Model-based Clustering using Automatic Differentiation: Confronting Misspecification and High-Dimensional Data
TLDR
This work designs a new penalty term for the likelihood based on the Kullback Leibler divergence between pairs of fitted components and demonstrates the efficacy of clustering using the proposed penalized likelihood approach.
Model-based clustering with sparse covariance matrices
TLDR
A penalized likelihood approach is employed for estimation and a general penalty term on the graph configurations can be used to induce different levels of sparsity and incorporate prior knowledge, which results in a parsimonious model-based clustering of the data via a flexible model for the within-group joint distribution of the variables.
Clustering in Misspecified and High-Dimensional Models using Automatic Differentiation
TLDR
This paper designs new KL divergence based model selection criteria and GD-based inference methods that use the criteria in fitting GMM on low- and high-dimensional data as well as for selecting the number of clusters.
Clustering and variable selection in the presence of mixed variable types and missing data.
TLDR
The goal of the work is to cluster patients thought to potentially have autism spectrum disorder into similar groups to help identify those with similar clinical presentation and identify a sparse subset of tests that inform the clusters in order to eliminate unnecessary testing.
A survey on feature selection methods for mixed data
TLDR
This paper provides the first comprehensive and structured revision of the existing supervised and unsupervised feature selection methods for mixed data reported in the literature.
Survey on High-Dimensional Medical Data Clustering
TLDR
A survey on high dimensional medical data clustering and different approaches related to this problem is done, focusing on the real-life applications and recent methods in high dimensional cluster analysis.
High-Dimensional Clustering via Random Projections
TLDR
This work proposes to generate a set of low dimensional independent random projections and to perform model-based clustering on each of them and suggests that the proposal represents a promising tool for high-dimensional clustering.
Gaussian mixture model with feature selection: An embedded approach
TLDR
This paper introduces a relevancy index (RI), a metric indicating the probability of assigning a data point to a specific clustering group, which reveals the contribution of the feature to the clustering process thus can assist the feature selection.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 168 REFERENCES
Variable Selection for Model-Based Clustering
We consider the problem of variable or feature selection for model-based clustering. The problem of comparing two nested subsets of variables is recast as a model comparison problem and addressed
Model-based clustering of high-dimensional data: A review
TLDR
Existing softwares for model-based clustering of high-dimensional data will be reviewed, their practical use will be illustrated on real-world data sets and clustering methods based on variable selection are reviewed.
Pairwise variable selection for high-dimensional model-based clustering.
TLDR
A pairwise variable selection method for high-dimensional model-based clustering based on a new pairwise penalty is proposed and results show that the new method performs better than alternative approaches that use ℓ(1) and ™(∞) penalties and offers better interpretation.
Variable selection in model-based clustering and discriminant analysis with a regularization approach
TLDR
This paper proposes an alternative regularization approach for variable selection in model-based clustering and classification, in which the variables are first ranked using a lasso-like procedure in order to avoid slow stepwise algorithms.
Variable selection for model-based high-dimensional clustering and its application to microarray data.
TLDR
Numerical results indicate that the two new methods tend to remove noninformative variables more effectively and provide better clustering results than the L(1)-norm approach.
Penalized Model-Based Clustering with Application to Variable Selection
TLDR
A penalized likelihood approach with an L1 penalty function is proposed, automatically realizing variable selection via thresholding and delivering a sparse solution in model-based clustering analysis with a common diagonal covariance matrix.
Variable Selection for Clustering and Classification
TLDR
This paper introduces a novel variable selection technique for use in clustering and classification analyses that is both intuitive and computationally efficient and focuses largely on applications in mixture model-based learning but could be adapted for use with various other clustering/classification methods.
Variable selection for clustering with Gaussian mixture models.
TLDR
A model generalizing the model of Raftery and Dean (2006) is proposed to specify the role of each variable, which does not need any prior assumptions about the linear link between the selected and discarded variables.
Variable selection for mixed data clustering: a model-based approach
We propose two approaches for selecting variables in latent class analysis (i.e.,mixture model assuming within component independence), which is the common model-based clustering method for mixed
Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures
Abstract Eight different variable selection techniques for model-based and non-model-based clustering are evaluated across a wide range of cluster structures. It is shown that several methods have
...
1
2
3
4
5
...