Thresholding for biomarker selection in multivariate data using Higher Criticism.

@article{Wehrens2012ThresholdingFB,
  title={Thresholding for biomarker selection in multivariate data using Higher Criticism.},
  author={Ron Wehrens and Pietro Franceschi},
  journal={Molecular bioSystems},
  year={2012},
  volume={8 9},
  pages={
          2339-46
        }
}
Biomarker selection is an important topic in the omics sciences, where holistic measurement methods routinely generate results for many variables simultaneously. Very often, only a small fraction of these variables are really associated with the phenomena of interest. Selection and identification of these biomarkers is essential for obtaining an understanding of the complex biological processes under study. Finding biomarkers, however, is a difficult task. Even if a relative order can be… 
Meta-Statistics for Variable Selection: The R Package BioMark
TLDR
An R package is proposed, BioMark, implementing two meta-statistics for variable selection, each of which presents a data-dependent selection threshold for significance, and it is shown using experimental spike-in data from the field of metabolomics that both approaches work well with real data.
Reflections on univariate and multivariate analysis of metabolomics data
TLDR
Applications of the t test, analysis of variance, principal component analysis and partial least squares discriminant analysis will be shown on both real and simulated metabolomics data examples to provide an overview on fundamental aspects of univariate and multivariate methods.
Higher Criticism for Large-Scale Inference: especially for Rare and Weak effects
TLDR
The Rare/Weak (RW) model is a theoretical framework simultaneously controlling the size and prevalence of useful/significant items among the useless/null bulk, and shows that HC has important advantages over better known procedures such as False Discovery Rate (FDR) control and Family-wise Error control (FwER), in particular, certain optimality properties.
Analysis of the Human Adult Urinary Metabolome Variations with Age, Body Mass Index, and Gender by Implementing a Comprehensive Workflow for Univariate and OPLS Statistical Analyses.
TLDR
The impact of gender and age on the urinary metabolome is highlighted, and thus it indicates that these factors should be taken into account for the design of metabolomics studies.
Data Fusion in Metabolomics and Proteomics for Biomarker Discovery.
TLDR
This work describes here a framework allowing combining multiple data sets, provided by different analytical platforms, and describes how the obtained latent variables are fused and further analyzed.
Statistical Analysis in Proteomics
  • K. Jung
  • Biology
    Methods in Molecular Biology
  • 2016
TLDR
The basic concepts behind proteomics mass spectrometry and the accompanying topic of protein and peptide separations are presented, with a focus on the properties of datasets emerging from such studies.
The power of tests for signal detection in high-dimensional data
In this thesis, we are interested in the testing problem, whether there are rare and weak signals (alternative) or no signals (null) within white noise background. To be more specific, we study the
Signal detection in extracellular neural ensemble recordings using higher criticism
TLDR
A robust strategy for detecting signals in broadband and noisy time series such as spikes, sharp waves and multi-unit activity data that is solely based on the intrinsic statistical distribution of the recorded data is presented.
Cutaneous expressions of interleukin-6 and neutrophil elastase as well as levels of serum IgA antibodies to gliadin nonapeptides, tissue transglutaminase and epidermal transglutaminase: implications for both autoimmunity and autoinflammation involvement in dermatitis herpetiformis
TLDR
Results might indicate the heterogenetic nature of DH pathogenesis suggesting further that both autoimmune and autoinflammatory phenomena may be involved in DH cutaneous pathology.
...
1
2
...

References

SHOWING 1-10 OF 23 REFERENCES
Gene ranking and biomarker discovery under correlation
TLDR
A simple procedure is proposed that adjusts gene-wise t-statistics to take account of correlations among genes and improves estimation of gene orderings and leads to higher power for fixed true discovery rate, and vice versa.
Higher criticism thresholding: Optimal feature selection when useful features are rare and weak
TLDR
In the most challenging RW settings, HCT uses an unconventionally low threshold, which keeps the missed-feature detection rate under better control than FDRT and yields a classifier with improved misclassification performance.
Metabolic profiling and the metabolome-wide association study: significance level for biomarker identification.
TLDR
The results show that the MWSL approach as estimated by the univariate t test is not outperformed by OPLS and offers a fast and simple method to detect disease-related discriminatory features in human NMR urinary metabolic profiles.
Classification and biomarker identification using gene network modules and support vector machines
TLDR
It is demonstrated that more than 90% accuracy can be obtained in classification of selected microarray datasets by integrating the interaction network information with the gene expression information from the microarrays.
Classification of Genes and Putative Biomarker Identification Using Distribution Metrics on Expression Profiles
TLDR
A systematic classification of genes and search for biomarkers of more than 16,000 genes from 2,145 mouse array samples was analyzed, and GEPs were identified as tissue-specific biomarker candidate genes.
A bioinformatics approach for biomarker identification in radiation-induced lung inflammation from limited proteomics data.
TLDR
A novel graph-based scoring function to rank and identify the most robust biomarkers from limited proteomics data and a novel bioinformatics ranking algorithm is proposed, suggesting that the proposed methodology is a potentially promising approach for the challenging problem of identifying relevant biomarkers in sample-limited clinical applications.
Statistical significance for genomewide studies
TLDR
This work proposes an approach to measuring statistical significance in genomewide studies based on the concept of the false discovery rate, which offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted.
A benchmark spike‐in data set for biomarker identification in metabolomics
TLDR
A publicly available metabolomic ultra performance liquid chromatography–mass spectrometry spike‐in data set for apples can serve as a test bed to assess the performance of new algorithms and compare them with previously published results.
...
1
2
3
...