Methods for phylogenetic analysis of microbiome data

@article{Washburne2018MethodsFP,
  title={Methods for phylogenetic analysis of microbiome data},
  author={Alex D. Washburne and James T. Morton and Jon G. Sanders and Daniel McDonald and Qiyun Zhu and Angela M Oliverio and Rob Knight},
  journal={Nature Microbiology},
  year={2018},
  volume={3},
  pages={652-661}
}
How does knowing the evolutionary history of microorganisms affect our analysis of microbiological datasets? Depending on the research question, the common ancestry of microorganisms can be a source of confounding variation, or a scaffolding used for inference. For example, when performing regression on traits, common ancestry is a source of dependence among observations, whereas when searching for clades with correlated abundances, common ancestry is the scaffolding for inference. The common… 
Transformation and differential abundance analysis of microbiome data incorporating phylogeny
TLDR
This paper proposes a model-based approach for microbiome data transformation, and a phylogenetically informed procedure for differential abundance (DA) testing based on the transformed data, and proposes adaptive analysis of composition of microbiomes (adaANCOM) for DA testing by constructing log-ratios adaptively on the tree for each taxon.
Phylogeny-corrected identification of microbial gene families relevant to human gut colonization
TLDR
This work uses metagenomics with phylogenetic modeling to identify gene families associated with higher prevalence in patients with Crohn’s disease, including Proteobacterial genes involved in conjugation and fimbria regulation, processes previously linked to inflammation.
Hypothesis testing for phylogenetic composition: a minimum-cost flow perspective.
TLDR
This work proposes a new maximum type test, detector of active flow on a tree, and investigates its properties to show that the proposed method is particularly powerful against sparse phylogenetic composition difference and enjoys certain optimality.
Decoding the Language of Microbiomes: Leveraging Patterns in 16S Public Data using Word-Embedding Techniques and Applications in Inflammatory Bowel Disease
TLDR
This paper shows that predictive models trained using property data are the most accurate, robust, and generalizable, and that property-based models can be trained on one dataset and deployed on another with positive results.
Incorporating genome-based phylogeny and functional similarity into diversity assessments helps to resolve a global collection of human gut metagenomes
TLDR
Tree-based measures greatly improved machine learning model performance for predicting westernization, disease status, and gender, relative to models trained solely on tree-agnostic measures, and ecophylogenetic and functional diversity measures were generally the most important features for predictive performance.
phylogenize: a web tool to identify microbial genes underlying environment associations
TLDR
This work presents phylogenize, a web server that allows researchers to apply phylogenetic regression to 16S amplicon as well as shotgun sequencing data and to visualize results, and shows that phylogenize draws similar conclusions from 16S and from shotgun sequencing.
A Phylogeny-based Test of Mediation Effect in Microbiome
Recent studies suggest that the microbiome can be an important mediator in the effect of a treatment on an outcome. Microbiome data generated from sequencing experiments contain the relative
A mixed model approach for estimating drivers of microbiota community composition and differential taxonomic abundance
TLDR
A novel GLMM-based approach for analysing the taxon-specific sequence read counts derived from standard meta-barcoding data is described and illustrated, and it is shown how these models can be used to determine the degree to which specific taxa or taxonomic groups are responsible for variance attributed to different drivers.
Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease
TLDR
It is shown that predictive models trained using property data are the most accurate, robust, and generalizable, and that property-based models can be trained on one dataset and deployed on another with positive results.
Survey of metaproteomics software tools for functional microbiome analysis
TLDR
The performance of six available metaproteomics software tools is explored to enable researchers to make informed decisions regarding software choice based on their research goals, and developers can use the resulting feedback to further optimize their algorithms.
...
1
2
3
4
...

References

SHOWING 1-10 OF 77 REFERENCES
Microbiomes in light of traits: A phylogenetic perspective
TLDR
Key aspects of microbial traits are reviewed and a synthesis of these studies reveals that, despite the promiscuity of HGT, microbial traits appear to be phylogenetically conserved, or not distributed randomly across the tree of life.
Phylogeny-corrected identification of microbial gene families relevant to human gut colonization
TLDR
This work uses metagenomics with phylogenetic modeling to identify gene families associated with higher prevalence in patients with Crohn’s disease, including Proteobacterial genes involved in conjugation and fimbria regulation, processes previously linked to inflammation.
A phylogenetic transform enhances analysis of compositional microbiota data
TLDR
The PhILR transform is introduced, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys and demonstrates that analyses of community-level structure can be applied toPhILR transformed data with performance on benchmarks rivaling or surpassing standard tools.
Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets
TLDR
This work uses the method, “phyl ofactorization,” to re-analyze datasets from the human body and soil microbial communities, demonstrating how phylofactorization is a dimensionality-reducing tool, an ordination-visualization tool, and an inferential tool for identifying edges in the phylogeny along which putative functional ecological traits may have arisen.
Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences
TLDR
The results demonstrate that phylogeny and function are sufficiently linked that this 'predictive metagenomic' approach should provide useful insights into the thousands of uncultivated microbial communities for which only marker gene surveys are currently available.
Phylogenies and the Comparative Method: A General Approach to Incorporating Phylogenetic Information into the Analysis of Interspecific Data
TLDR
A generalized linear model (GLM) is presented for the analysis of comparative data, which can be used to address questions regarding the relationship between traits or between traits and environments, the rate of phenotypic evolution, the degree of phylogenetic effect, and the ancestral state of a character.
Beyond classification: gene-family phylogenies from shotgun metagenomic reads enable accurate community analysis
TLDR
Bacterial community diversity can be quantified using phylogenetic approaches applied to shotgun metagenomic data using a software pipeline to generate in silico bacterial communities and a profile-based alignment of the reads from which a gene-family phylogenetic tree can be built.
UniFrac: a New Phylogenetic Method for Comparing Microbial Communities
TLDR
The results illustrate that UniFrac provides a new way of characterizing microbial communities, using the wealth of environmental rRNA sequences, and allows quantitative insight into the factors that underlie the distribution of lineages among environments.
Species divergence and the measurement of microbial diversity.
TLDR
Divergence-based methods are providing new insights into microbial community structure and function because microorganisms in a community differ dramatically in sequence similarity, which also often correlates with phenotypic similarity in key features such as metabolic capabilities.
Fast UniFrac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data
TLDR
The potential of Fast UniFrac is shown using examples from three data types: Sanger-sequencing studies of diverse free-living and animal-associated bacterial assemblages and from the gut of obese humans as they diet, pyrosequencing data integrated from studies of the human hand and gut, and PhyloChip data from a study of citrus pathogens.
...
1
2
3
4
5
...