Thresholding for biomarker selection in multivariate data using Higher Criticism.
@article{Wehrens2012ThresholdingFB,
title={Thresholding for biomarker selection in multivariate data using Higher Criticism.},
author={Ron Wehrens and Pietro Franceschi},
journal={Molecular bioSystems},
year={2012},
volume={8 9},
pages={
2339-46
}
}Biomarker selection is an important topic in the omics sciences, where holistic measurement methods routinely generate results for many variables simultaneously. Very often, only a small fraction of these variables are really associated with the phenomena of interest. Selection and identification of these biomarkers is essential for obtaining an understanding of the complex biological processes under study. Finding biomarkers, however, is a difficult task. Even if a relative order can be…
13 Citations
Meta-Statistics for Variable Selection: The R Package BioMark
- Computer Science
- 2012
An R package is proposed, BioMark, implementing two meta-statistics for variable selection, each of which presents a data-dependent selection threshold for significance, and it is shown using experimental spike-in data from the field of metabolomics that both approaches work well with real data.
Stable biomarker screening and classification by subsampling-based sparse regularization coupled with support vector machines in metabolomics
- Computer Science
- 2017
Reflections on univariate and multivariate analysis of metabolomics data
- MathematicsMetabolomics
- 2013
Applications of the t test, analysis of variance, principal component analysis and partial least squares discriminant analysis will be shown on both real and simulated metabolomics data examples to provide an overview on fundamental aspects of univariate and multivariate methods.
Higher Criticism for Large-Scale Inference: especially for Rare and Weak effects
- Computer Science
- 2014
The Rare/Weak (RW) model is a theoretical framework simultaneously controlling the size and prevalence of useful/significant items among the useless/null bulk, and shows that HC has important advantages over better known procedures such as False Discovery Rate (FDR) control and Family-wise Error control (FwER), in particular, certain optimality properties.
Analysis of the Human Adult Urinary Metabolome Variations with Age, Body Mass Index, and Gender by Implementing a Comprehensive Workflow for Univariate and OPLS Statistical Analyses.
- BiologyJournal of proteome research
- 2015
The impact of gender and age on the urinary metabolome is highlighted, and thus it indicates that these factors should be taken into account for the design of metabolomics studies.
Data Fusion in Metabolomics and Proteomics for Biomarker Discovery.
- BiologyMethods in molecular biology
- 2016
This work describes here a framework allowing combining multiple data sets, provided by different analytical platforms, and describes how the obtained latent variables are fused and further analyzed.
Statistical Analysis in Proteomics
- BiologyMethods in Molecular Biology
- 2016
The basic concepts behind proteomics mass spectrometry and the accompanying topic of protein and peptide separations are presented, with a focus on the properties of datasets emerging from such studies.
The power of tests for signal detection in high-dimensional data
- Mathematics
- 2017
In this thesis, we are interested in the testing problem, whether there are rare and weak signals (alternative) or no signals (null) within white noise background. To be more specific, we study the…
Signal detection in extracellular neural ensemble recordings using higher criticism
- Computer ScienceICBHI 2019
- 2019
A robust strategy for detecting signals in broadband and noisy time series such as spikes, sharp waves and multi-unit activity data that is solely based on the intrinsic statistical distribution of the recorded data is presented.
Cutaneous expressions of interleukin-6 and neutrophil elastase as well as levels of serum IgA antibodies to gliadin nonapeptides, tissue transglutaminase and epidermal transglutaminase: implications for both autoimmunity and autoinflammation involvement in dermatitis herpetiformis
- Biology, MedicineCentral-European journal of immunology
- 2014
Results might indicate the heterogenetic nature of DH pathogenesis suggesting further that both autoimmune and autoinflammatory phenomena may be involved in DH cutaneous pathology.
References
SHOWING 1-10 OF 23 REFERENCES
Assessing the statistical validity of proteomics based biomarkers.
- BiologyAnalytica chimica acta
- 2007
Gene ranking and biomarker discovery under correlation
- BiologyBioinform.
- 2009
A simple procedure is proposed that adjusts gene-wise t-statistics to take account of correlations among genes and improves estimation of gene orderings and leads to higher power for fixed true discovery rate, and vice versa.
Higher criticism thresholding: Optimal feature selection when useful features are rare and weak
- Computer ScienceProceedings of the National Academy of Sciences
- 2008
In the most challenging RW settings, HCT uses an unconventionally low threshold, which keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance.
Metabolic profiling and the metabolome-wide association study: significance level for biomarker identification.
- BiologyJournal of proteome research
- 2010
The results show that the MWSL approach as estimated by the univariate t test is not outperformed by OPLS and offers a fast and simple method to detect disease-related discriminatory features in human NMR urinary metabolic profiles.
Classification and biomarker identification using gene network modules and support vector machines
- Computer Science, BiologyBMC Bioinformatics
- 2009
It is demonstrated that more than 90% accuracy can be obtained in classification of selected microarray datasets by integrating the interaction network information with the gene expression information from the microarrays.
Classification of Genes and Putative Biomarker Identification Using Distribution Metrics on Expression Profiles
- BiologyPloS one
- 2010
A systematic classification of genes and search for biomarkers of more than 16,000 genes from 2,145 mouse array samples was analyzed, and GEPs were identified as tissue-specific biomarker candidate genes.
A bioinformatics approach for biomarker identification in radiation-induced lung inflammation from limited proteomics data.
- BiologyJournal of proteome research
- 2011
A novel graph-based scoring function to rank and identify the most robust biomarkers from limited proteomics data and a novel bioinformatics ranking algorithm is proposed, suggesting that the proposed methodology is a potentially promising approach for the challenging problem of identifying relevant biomarkers in sample-limited clinical applications.
Statistical significance for genomewide studies
- BiologyProceedings of the National Academy of Sciences of the United States of America
- 2003
This work proposes an approach to measuring statistical significance in genomewide studies based on the concept of the false discovery rate, which offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted.
A benchmark spike‐in data set for biomarker identification in metabolomics
- Computer Science
- 2012
A publicly available metabolomic ultra performance liquid chromatography–mass spectrometry spike‐in data set for apples can serve as a test bed to assess the performance of new algorithms and compare them with previously published results.