Supervised clustering of high dimensional data using regularized mixture modeling

  title={Supervised clustering of high dimensional data using regularized mixture modeling},
  author={Wennan Chang and Changlin Wan and Yong Zang and Chi Zhang and Sha Cao},
  journal={Briefings in bioinformatics},
  volume={22 4}
Identifying relationships between genetic variations and their clinical presentations has been challenged by the heterogeneous causes of a disease. It is imperative to unveil the relationship between the high-dimensional genetic manifestations and the clinical presentations, while taking into account the possible heterogeneity of the study subjects.We proposed a novel supervised clustering algorithm using penalized mixture regression model, called component-wise sparse mixture regression (CSMR… 
4 Citations

Figures and Tables from this paper

Robust structured heterogeneity analysis approach for high‐dimensional data

A robust structured heterogeneity analysis approach to identify subgroups, select important genes as well as estimate their effects on the phenotype of interest is developed and outperforms alternatives in revealing the heterogeneity and selecting important genes for each subgroup.

RobMixReg: an R package for robust, flexible and high dimensional mixture regression

An R package called RobMixReg is developed, which provides comprehensive solutions for robust, flexible as well as high dimensional mixture modeling inMotivation Mixture regression.

SSMD: A semi-supervised approach for a robust cell type identification and deconvolution of mouse transcriptomics data

A novel tissue deconvolution method, namely SSMD, which is specifically designed for mouse data to handle the variations caused by different mouse strain, genetic and phenotypic background, and experimental platforms and achieves much improved performance in estimating relative proportion of the cell types compared with state-of-the-art methods.



Drug sensitivity prediction with high-dimensional mixture regression

The numerical results indicate that the proposed method can make a drastic improvement over the existing ones, such as random forest, support vector regression, and regularized linear regression, in both drug sensitivity prediction and feature selection.

Feature selection in finite mixture of sparse normal linear models in high-dimensional feature space.

This work considers the problem of feature selection in finite mixture of sparse normal linear (FMSL) models in large feature spaces and proposes a 2-stage procedure to overcome computational difficulties and large false discovery rates caused by the large model space.

A new method for robust mixture regression

  • Chun YuW. YaoKun Chen
  • Computer Science
    The Canadian journal of statistics = Revue canadienne de statistique
  • 2017
A penalized likelihood approach is adopted to induce sparsity among the mean‐shift parameters so that the outliers are distinguished from the remainder of the data, and a generalized Expectation–Maximization algorithm is developed to perform stable and efficient computation.

Bi-clustering based biological and clinical characterization of colorectal cancer in complementary to CMS classification

Analysis on multiple large scale CRC transcriptomics data sets using a bi-clustering based formulation suggests that the detected local low rank modules can not only generate new biological understanding coherent to CMS stratification, but also identify predictive markers for prognosis that are general to CRC or CMS dependent, as well as novel alternative drug resistance mechanisms.

ℓ1-penalization for mixture regression models

We consider a finite mixture of regressions (FMR) model for high-dimensional inhomogeneous data where the number of covariates may be much larger than sample size. We propose an ℓ1-penalized maximum

Finite Mixture Models

The aim of this article is to provide an up-to-date account of the theory and methodological developments underlying the applications of finite mixture models.

Model-Based Clustering, Discriminant Analysis, and Density Estimation

This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.

Penalized Model-Based Clustering with Application to Variable Selection

A penalized likelihood approach with an L1 penalty function is proposed, automatically realizing variable selection via thresholding and delivering a sparse solution in model-based clustering analysis with a common diagonal covariance matrix.

Systematic Assessment of Analytical Methods for Drug Sensitivity Prediction from Cancer Cell Line Data

This work evaluated over 110,000 different models, based on a multifactorial experimental design testing systematic combinations of modeling factors within several categories of modeling choices, suggesting that model input data and choice of compound are the primary factors explaining model performance.

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.