A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data

@article{Xiao2018APS,
  title={A Phylogeny-Regularized Sparse Regression Model for Predictive Modeling of Microbial Community Data},
  author={Jian Xiao and Li Chen and Yue Yu and Xianyang Zhang and Jun Chen},
  journal={Frontiers in Microbiology},
  year={2018},
  volume={9}
}
Fueled by technological advancement, there has been a surge of human microbiome studies surveying the microbial communities associated with the human body and their links with health and disease. As a complement to the human genome, the human microbiome holds great potential for precision medicine. Efficient predictive models based on microbiome data could be potentially used in various clinical applications such as disease diagnosis, patient stratification and drug response prediction. One… 

Figures from this paper

A novel deep learning method for predictive modeling of microbiome data
TLDR
A novel deep learning prediction method MDeep (microbiome-based deep learning method) to predict both continuous and binary outcomes and demonstrates that MDeep outperforms competing methods in both regression and binary classifications.
RFtest: A Robust and Flexible Community-Level Test for Microbiome Data Powerfully Detects Phylogenetically Clustered Signals
TLDR
This study proposes “Random Forest Test” (RFtest), a global (community-level) test based on random forest for high-dimensional and phylogenetically structured microbiome data, a permutation test using the generalization error of random forest as the test statistic.
Microbiome compositional analysis with logistic-tree normal models
TLDR
This work introduces a generative model, called the “logistic-tree normal” (LTN) model, that marries two popular classes of models—namely, log-ratio normal (LN) and Dirichlet-tree (DT)—and inherits the key benefits of each.
A Review and Tutorial of Machine Learning Methods for Microbiome Host Trait Prediction
TLDR
The most commonly used machine learning methods are explored, and their prediction accuracy as applied to microbiome host trait prediction is evaluated.
Sparse least trimmed squares regression with compositional covariates for high-dimensional data
TLDR
The numerical performance of the proposed method is evaluated via simulation studies, and its usefulness is illustrated by an application to a microbiome study with the aim to predict caffeine intake based on the human gut microbiome composition.
Feature selection and causal analysis for microbiome studies in the presence of confounding using standardization
TLDR
Standardization enables more accurate identification of individual microbiome features with an effect on the outcome of interest compared to other variable selection and estimation procedures when there is confounding by a categorical variable.
Principal Amalgamation Analysis for Microbiome Data
TLDR
This work proposes Principal Amalgamation Analysis (PAA), a novel amalgamation-based and taxonomy-guided dimension reduction paradigm for microbiome data that aims to aggregate the compositions into a smaller number of principal compositions, guided by the available taxonomic structure.
Image and graph convolution networks improve microbiome-based machine learning accuracy
TLDR
Two novel methods to combine information from different bacteria and improve data representation for machine learning using bacterial taxonomy are suggested and it is shown that both algorithms improve performance of static 16S rRNA gene sequence-based machine learning compared to the best state-of-the-art methods.
Multi-omic Characterization of the Taxa-function Relationship in Infant Gut Microbiomes
TLDR
The results suggest a degree of overall association between taxonomic profiles and metabolite concentrations, but lack of predictive capacity for stool metabolic signatures reflects, in part, the possible role of functional redundancy in defining the taxa-function relationship in early life as well as the bidirectional nature of the microbiome-metabolome association.
...
...

References

SHOWING 1-10 OF 79 REFERENCES
Phylogeny-Based Kernels with Application to Microbiome Association Studies
TLDR
A three-parameter phylogeny-based kernel, which allows modeling a wide range of nonlinear relationships, is provided, which has a nice biological interpretation and, by tuning the parameter, can gain insights about how the microbiome interacts with the environment.
Constructing Predictive Microbial Signatures at Multiple Taxonomic Levels
TLDR
This work introduces the concept of variable fusion for high-dimensional compositional data and proposes a novel tree-guided variable fusion method that incorporates the tree information node-by-node and is capable of building predictive models comprised of bacterial taxa at different taxonomic levels.
False discovery rate control incorporating phylogenetic tree increases detection power in microbiome‐wide multiple testing
TLDR
A new FDR control procedure is proposed that incorporates the prior structure information and applies it to microbiome data and achieves a similar power as traditional procedures that do not take into account the tree structure.
Phylogeny-based classification of microbial communities
TLDR
A novel supervised classification method for microbial community samples, where each sample is represented as a set of OTU frequencies, which takes advantage of the natural structure in microbial community data encoded by a phylogenetic tree, to take advantage of environment-specific compositional patterns that may contain features at multiple granularity levels.
Supervised classification of human microbiota.
TLDR
This review demonstrates that several existing supervised classifiers can be applied effectively to microbiota classification, both for selecting subsets of taxa that are highly discriminative of the type of community, and for building models that can accurately classify unlabeled data.
Phylogenetic approaches to microbial community classification
TLDR
The classification of oral microbiota remains a challenging problem; the best accuracy on the plaque dataset was approximately 81 %.
Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights
TLDR
A computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers, is developed, which can be considered a first step toward defining general microbial dysbiosis.
Microbiomes in light of traits: A phylogenetic perspective
TLDR
Key aspects of microbial traits are reviewed and a synthesis of these studies reveals that, despite the promiscuity of HGT, microbial traits appear to be phylogenetically conserved, or not distributed randomly across the tree of life.
Gut microbiome-host interactions in health and disease
TLDR
Recent metagenomic and metabonomic approaches that have enabled advances in understanding gut microbiome activity in relation to human health, and gut microbial modulation for the treatment of disease are reviewed.
Identification of important regressor groups, subgroups and individuals via regularization methods: application to gut microbiome data
TLDR
A new variable selection method is developed that identifies important features at multiple taxonomy levels and outperformed competing methods: it more often selected significant variables, and had small false discovery rates and acceptable false-positive rates.
...
...