• Corpus ID: 220686387

Outcome-Guided Disease Subtyping for High-Dimensional Omics Data

  title={Outcome-Guided Disease Subtyping for High-Dimensional Omics Data},
  author={Peng Liu and Yusi Fang and Zhao Ren and Lu Tang and George C. Tseng},
  journal={arXiv: Methodology},
High-throughput microarray and sequencing technology have been used to identify disease subtypes that could not be observed otherwise by using clinical variables alone. The classical unsupervised clustering strategy concerns primarily the identification of subpopulations that have similar patterns in gene features. However, as the features corresponding to irrelevant confounders (e.g. gender or age) may dominate the clustering process, the resulting clusters may or may not capture clinically… 

Figures and Tables from this paper



Semi-supervised recursively partitioned mixture models for identifying cancer subtypes

This work proposes a method called semi-supervised recursively partitioned mixture models (SS-RPMM) that utilizes array-based genetic and patient-level clinical data for finding cancer subtypes that are associated with patient survival and compared favorably with other competing semi- supervised methods.

Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data

Diagnostic procedures are presented that accurately predict the survival of future patients based on the gene expression profile and survival times of previous patients that have been successfully applied to several publicly available datasets.

Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery.

This manuscript proposes an integrative sparse K-means (is-K means) approach to discover disease subtypes with the guidance of prior biological knowledge via sparse overlapping group lasso and demonstrates its superior clustering accuracy, feature selection, functional annotation of detected molecular features and computing efficiency.

Supervised Bayesian latent class models for high‐dimensional data

This work proposes two latent class models for classification and variable selection in the presence of high‐dimensional binary data, fit by using Bayesian Markov chain Monte Carlo techniques and applies these methodologies to the glioma study for which identifiable three‐class parameter estimates cannot be obtained without penalization.

Semi‐supervised clustering methods

  • E. Bair
  • Computer Science
    Wiley interdisciplinary reviews. Computational statistics
  • 2013
Several clustering algorithms that can be applied in many situations to identify clusters that are associated with a particular outcome variable, including document processing and modern genetics are described.

Molecular portraits of human breast tumours

Variation in gene expression patterns in a set of 65 surgical specimens of human breast tumours from 42 different individuals were characterized using complementary DNA microarrays representing 8,102 human genes, providing a distinctive molecular portrait of each tumour.

A New Principle for Tuning-Free Huber Regression

A New Principle for Tuning-Free Huber Regression Abstract: The robustification parameter, which balances bias and robustness, has played a crit-ical role in the construction of sub-Gaussian estimators

A penalized latent class model for ordinal data.

By stabilizing maximum likelihood estimation, this work is able to fit an ordinal latent class model that would otherwise not be identifiable without application of strict constraints to facilitate analysis of high-dimensional ordinal data.

A Framework for Feature Selection in Clustering

A novel framework for sparse clustering is proposed, in which one clusters the observations using an adaptively chosen subset of the features, which uses a lasso-type penalty to select the features.

Adaptive Huber Regression

A sharp phase transition is established for robust estimation of regression parameters in both low and high dimensions: when, the estimator admits a sub- Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime and the transition is smooth and optimal.