Penalized Model-Based Clustering with Application to Variable Selection


Variable selection in clustering analysis is both challenging and important. In the context of modelbased clustering analysis with a common diagonal covariance matrix, which is especially suitable for “high dimension, low sample size” settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method.

Extracted Key Phrases

12 Figures and Tables

Citations per Year

141 Citations

Semantic Scholar estimates that this publication has 141 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Pan2007PenalizedMC, title={Penalized Model-Based Clustering with Application to Variable Selection}, author={Wei Pan and Xiaotong Shen}, journal={Journal of Machine Learning Research}, year={2007}, volume={8}, pages={1145-1164} }