Outcome-Guided Disease Subtyping for High-Dimensional Omics Data
@article{Liu2020OutcomeGuidedDS, title={Outcome-Guided Disease Subtyping for High-Dimensional Omics Data}, author={Peng Liu and Yusi Fang and Zhao Ren and Lu Tang and George C. Tseng}, journal={arXiv: Methodology}, year={2020} }
High-throughput microarray and sequencing technology have been used to identify disease subtypes that could not be observed otherwise by using clinical variables alone. The classical unsupervised clustering strategy concerns primarily the identification of subpopulations that have similar patterns in gene features. However, as the features corresponding to irrelevant confounders (e.g. gender or age) may dominate the clustering process, the resulting clusters may or may not capture clinically…
References
SHOWING 1-10 OF 26 REFERENCES
Semi-supervised recursively partitioned mixture models for identifying cancer subtypes
- BiologyBioinform.
- 2010
This work proposes a method called semi-supervised recursively partitioned mixture models (SS-RPMM) that utilizes array-based genetic and patient-level clinical data for finding cancer subtypes that are associated with patient survival and compared favorably with other competing semi- supervised methods.
Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data
- Biology, Computer SciencePLoS biology
- 2004
Diagnostic procedures are presented that accurately predict the survival of future patients based on the gene expression profile and survival times of previous patients that have been successfully applied to several publicly available datasets.
Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery.
- Computer ScienceThe annals of applied statistics
- 2017
This manuscript proposes an integrative sparse K-means (is-K means) approach to discover disease subtypes with the guidance of prior biological knowledge via sparse overlapping group lasso and demonstrates its superior clustering accuracy, feature selection, functional annotation of detected molecular features and computing efficiency.
Supervised Bayesian latent class models for high‐dimensional data
- Computer ScienceStatistics in medicine
- 2012
This work proposes two latent class models for classification and variable selection in the presence of high‐dimensional binary data, fit by using Bayesian Markov chain Monte Carlo techniques and applies these methodologies to the glioma study for which identifiable three‐class parameter estimates cannot be obtained without penalization.
Semi‐supervised clustering methods
- Computer ScienceWiley interdisciplinary reviews. Computational statistics
- 2013
Several clustering algorithms that can be applied in many situations to identify clusters that are associated with a particular outcome variable, including document processing and modern genetics are described.
Molecular portraits of human breast tumours
- Biology, MedicineNature
- 2000
Variation in gene expression patterns in a set of 65 surgical specimens of human breast tumours from 42 different individuals were characterized using complementary DNA microarrays representing 8,102 human genes, providing a distinctive molecular portrait of each tumour.
A New Principle for Tuning-Free Huber Regression
- MathematicsStatistica Sinica
- 2021
A New Principle for Tuning-Free Huber Regression Abstract: The robustification parameter, which balances bias and robustness, has played a crit-ical role in the construction of sub-Gaussian estimators…
A penalized latent class model for ordinal data.
- Computer ScienceBiostatistics
- 2008
By stabilizing maximum likelihood estimation, this work is able to fit an ordinal latent class model that would otherwise not be identifiable without application of strict constraints to facilitate analysis of high-dimensional ordinal data.
A Framework for Feature Selection in Clustering
- Computer ScienceJournal of the American Statistical Association
- 2010
A novel framework for sparse clustering is proposed, in which one clusters the observations using an adaptively chosen subset of the features, which uses a lasso-type penalty to select the features.
Adaptive Huber Regression
- Computer ScienceJournal of the American Statistical Association
- 2020
A sharp phase transition is established for robust estimation of regression parameters in both low and high dimensions: when, the estimator admits a sub- Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime and the transition is smooth and optimal.