• Corpus ID: 3436138

Statistical Methods and Workflow for Analyzing Human Metabolomics Data.

  title={Statistical Methods and Workflow for Analyzing Human Metabolomics Data.},
  author={Joseph Antonelli and Brian Lee Claggett and Mir Henglin and Jeramie D. Watrous and Kim Lehmann and Pavel Hushcha and Olga V. Demler and Samia Mora and Teemu J. Niiranen and Alexandre C. Pereira and Mohit M. Jain and Susan Cheng},
  journal={arXiv: Quantitative Methods},
High-throughput metabolomics investigations, when conducted in large human cohorts, represent a potentially powerful tool for elucidating the biochemical diversity and mechanisms underlying human health and disease. Large-scale metabolomics data, generated using targeted or nontargeted platforms, are increasingly more common. Appropriate statistical analysis of these complex high-dimensional data is critical for extracting meaningful results from such large-scale human metabolomics studies… 

Figures and Tables from this paper

LC-MS/MS metabolomics-facilitated identification of the active compounds responsible for anti-allergic activity of the ethanol extract of Xenostegia tridentata
The ethyl acetate subfraction showed the highest anti-allergic activity among various sub-partitions and showed better activity than the crude extract, consistent with the high abundance of total phenolic and flavonoid contents in this sub-fraction.


Mixture model normalization for non-targeted gas chromatography/mass spectrometry metabolomics data
When quality control samples are systematically included in batches, mixnorm is uniquely suited to normalizing non-targeted GC/MS metabolomics data due to explicit accommodation of batch effects, run order and varying thresholds of detectability.
Centering, scaling, and transformations: improving the biological information content of metabolomics data
Range scaling and autoscaling were able to remove the dependence of the rank of the metabolites on the average concentration and the magnitude of the fold changes and showed biologically sensible results after PCA (principal component analysis).
Using MetaboAnalyst 3.0 for Comprehensive Metabolomics Data Analysis
This unit provides an overview of the main functional modules and the general workflow of the latest version of MetaboAnalyst (MetaboAn analyst 3.0), followed by eight detailed protocols.
Web-based inference of biological patterns, functions and pathways from metabolomic data using MetaboAnalyst
This protocol provides a step-wise description on how to format and upload data to MetaboAnalyst, how to process and normalize data,How to identify significant features and patterns through univariate and multivariate statistical methods and how to use metabolite set enrichment analysis and metabolic pathway analysis to help elucidate possible biological mechanisms.
A Sparse PLS for Variable Selection when Integrating Omics Data
This study focuses on the integration of two-block data that are measured on the same samples and shows that sparse PLS provides a valuable variable selection tool for highly dimensional data sets.
Improved batch correction in untargeted MS-based metabolomics
This paper compares several batch correction methods, investigates the effect of different strategies for handling non-detects, and assesses the merits of these batch correction strategies using three large LC–MS and GC–MS data sets of samples from Arabidopsis thaliana.
Metabolite profiling and cardiovascular event risk: a prospective study of 3 population-based cohorts.
The value of high-throughput metabolomics for biomarker discovery and improved risk assessment is substantiated and the value of net reclassification was particularly improved for persons in the 5% to 10% risk range.
Large-scale untargeted LC-MS metabolomics data correction using between-batch feature alignment and cluster-based within-batch signal intensity drift correction
These procedures to address and correct for within- and between-batch variability in processing multiple-batch untargeted LC-MS metabolomics data to increase their quality provide unbiased measures of improved data quality, with implications for improved data analysis.
Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems
A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework and has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets.
A batch correction method for liquid chromatography–mass spectrometry data that does not depend on quality control samples
It is shown that the use of QC samples can lead to problems and non-QC correction methods are compared with standard QC correction and demonstrated their success in reducing differences between replicate samples and their potential to highlight differences between experimental groups previously hidden by instrumental variation.