Improved high-dimensional prediction with Random Forests by the use of co-data

@article{Beest2017ImprovedHP,
  title={Improved high-dimensional prediction with Random Forests by the use of co-data},
  author={Dennis E. te Beest and Steven W. Mes and Saskia M. Wilting and Ruud H. Brakenhoff and Mark A. van de Wiel},
  journal={BMC Bioinformatics},
  year={2017},
  volume={18}
}
BackgroundPrediction in high dimensional settings is difficult due to the large number of variables relative to the sample size. We demonstrate how auxiliary ‘co-data’ can be used to improve the performance of a Random Forest in such a setting.ResultsCo-data are incorporated in the Random Forest by replacing the uniform sampling probabilities that are used to draw candidate variables by co-data moderated sampling probabilities. Co-data here are defined as any type information that is available… 
High-Dimensional Random Forests High-Dimensional Random Forests High-Dimensionl Random Forests with Ridge Regression Based Variable Screening Curriculum Vitae
TLDR
A new algorithm is presented that incorporates ridge regression as a variable screening tool to discern informative features in the setting of high dimensions and apply the classical random forest to a top portion of selected important features.
Intervention in prediction measure: a new approach to assessing variable importance for random forests
TLDR
A new alternative importance measure, called Intervention in Prediction Measure, is investigated, which depends on the structure of the trees, without depending on performance measures, and is expressed as a percentage, which makes it attractive in terms of interpretability.
Learning from a lot: Empirical Bayes in high-dimensional prediction settings
TLDR
It is argued that empirical Bayes is particularly useful when the prior contains multiple parameters which model a priori information on variables, termed `co-data', and presented two novel examples that allow for co-data.
Learning from a lot: Empirical Bayes for high‐dimensional model‐based prediction
TLDR
It is argued that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed “co‐data”, and presented two novel examples that allow for co‐data.
Large-scale variational inference for Bayesian joint regression modelling of high-dimensional genetic data
TLDR
A Bayesian hierarchical approach for joint analysis of QTL data on a genome-wide scale that allows information-sharing across outcomes and variants, thereby enhancing the detection of weak trans and hotspot effects, and implements tailored variational inference procedures that allow simultaneous analysis of data for an entire QTL study.
Adaptive group-regularized logistic elastic net regression
TLDR
Simulations and applications to three cancer genomics studies and one Alzheimer metabolomics study show that, if the partitioning of the features is informative, classification performance, and feature selection are indeed enhanced.
Crossbreeding in Random Forest
TLDR
This paper presents a novel approach based on crossbreeding of the best tree branches to increase the performance of RF in space and speed while keeping the performance in the classification measures.
Multi-task deep autoencoder to predict Alzheimer's disease progression using temporal DNA methylation data in peripheral blood
  • L. Chen
  • Computer Science
    medRxiv
  • 2022
TLDR
It is demonstrated that the multi-task deep autoencoders outperform state-of-the-art machine learning approaches for both predicting AD progression and reconstructing the temporal methylation profiles.
...
1
2
...

References

SHOWING 1-10 OF 50 REFERENCES
High-Dimensional Variable Selection for Survival Data
TLDR
This work derives the distribution of the minimal depth and uses it for high-dimensional variable selection using random survival forests, and develops a new regularized algorithm, termed RSF-Variable Hunting.
Better prediction by use of co‐data: adaptive group‐regularized ridge regression
TLDR
A method for adaptive group‐regularized (logistic) ridge regression, which makes structural use of ‘co‐data’, which improves the predictive performances of ordinary logistic ridge regression and the group lasso and derives empirical Bayes estimates of group‐specific penalties.
AUC-RF: A New Strategy for Genomic Profiling with Random Forest
TLDR
This work proposes a new algorithm for genomic profiling based on optimizing the area under the receiver operating characteristic curve (AUC) of the random forest (RF), which implements a backward elimination process based on the initial ranking of variables.
Gene selection and classification of microarray data using random forest
TLDR
It is shown that random forest has comparable performance to other classification methods, including DLDA, KNN, and SVM, and that the new gene selection procedure yields very small sets of genes (often smaller than alternative methods) while preserving predictive accuracy.
Weighted Lasso with Data Integration
TLDR
Through simulations, it is shown that the weighted lasso with integrated relevant external information on the covariates outperforms the lasso and the adaptive lasso when the external information is from relevant to partly relevant, in terms of both variable selection and prediction.
Enriched random forests
TLDR
This work proposes a novel, yet simple, adjustment that has demonstrably superior performance: choose the eligible subsets at each node by weighted random sampling instead of simple random sampling, with the weights tilted in favor of the informative features.
Random generalized linear model: a highly accurate and interpretable ensemble predictor
TLDR
RGLM is a state of the art predictor that shares the advantages of a random forest (excellent predictive accuracy, feature importance measures, out-of-bag estimates of accuracy) with those of a forward selected generalized linear model (interpretability).
Random Forests
TLDR
Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures
TLDR
It is shown that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation, and Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches.
RANDOM LASSO.
TLDR
The proposed random lasso method alleviates some of the limitations of lasso, elastic-net and related methods noted especially in the context of microarray data analysis: it tends to remove highly correlated variables altogether or select them all, and maintains maximal flexibility in estimating their coefficients.
...
1
2
3
4
5
...