A Critique of Differential Abundance Analysis, and Advocacy for an Alternative

  title={A Critique of Differential Abundance Analysis, and Advocacy for an Alternative},
  author={Thomas P. Quinn and Elliott Gordon-Rodr{\'i}guez and Ionas Erb},
  journal={arXiv: Methodology},
It is largely taken for granted that differential abundance analysis is, by default, the best first step when analyzing genomic data. We argue that this is not necessarily the case. In this article, we identify key limitations that are intrinsic to differential abundance analysis: it is (a) dependent on unverifiable assumptions, (b) an unreliable construct, and (c) overly reductionist. We formulate an alternative framework called ratio-based biomarker analysis which does not suffer from the… 

Figures and Tables from this paper

Learning sparse log-ratios for high-throughput sequencing data

This work presents CoDaCoRe, a novel learning algorithm that identifies sparse, interpretable, and predictive log-ratio biomarkers from HTS data by exploiting a continuous relaxation to approximate the underlying combinatorial optimization problem.

Treating Bugs as Features: A compositional guide to the statistical analysis of the microbiome-gut-brain axis

This guidebook features an extensive and heavily annotated microbiome analysis in R in the supplementary materials, including a demonstration of volatility analysis and functional gut-metabolic and gut-brain module analysis as a resource for new and experienced bioinformaticians alike.

Data Augmentation for Compositional Data: Advancing Predictive Models of the Microbiome

This work extends the success of data augmentation to compositional data, i.e., simplex-valued data, which is of particular interest in the context of the human microbiome, and sets a new state-of-the-art for key disease prediction tasks including colorectal cancer, type 2 diabetes, and Crohn's disease.

Gut metagenome associations with extensive digital health data in a volunteer-based Estonian microbiome cohort

It is shown that long-term antibiotic usage, independent from recent administration, has a significant impact on the microbiome composition, partly explaining the common associations between diseases.



Rank normalization empowers a t-test for microbiome differential abundance analysis while controlling for false discoveries

On a rigorous 3rd-party benchmarking simulation, rank normalization is shown to offer strong control over the false discovery rate, and at sample sizes greater than 50 per treatment group, to offer an improvement in performance over commonly used normalization factors paired with t-tests, Wilcoxon rank-sum tests and methodologies implemented by R packages.

Finding the centre: corrections for asymmetry in high-throughput sequencing datasets

This work extends a previously described log-ratio transformation method that allows for variable comparisons between samples in a Bayesian compositional context and demonstrates the pathology in modelled and real unbalanced experimental designs to show how this dramatically causes both false negative and false positive inference.

A field guide for the compositional analysis of any-omics data

Next-generation sequencing (NGS) has made it possible to determine the sequence and relative abundance of all nucleotides in a biological or environmental sample. Today, NGS is routinely used to

Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection

The relative nature of health biomarkers is emphasized, the literature surrounding the classification of relative data is discussed, and how different transformations perform for regularized logistic regression across multiple biomarker types is benchmarked.

LinDA: linear models for differential abundance analysis of microbiome compositional data

This work shows that LinDA enjoys asymptotic FDR control and can be extended to mixed-effect models for correlated microbiome data and demonstrate the effectiveness of LinDA.

Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible

It is advocated that investigators avoid rarefying altogether and supported statistical theory is provided that simultaneously accounts for library size differences and biological variability using an appropriate mixture model.

Differential proportionality –a normalization-free approach to differential gene expression

A novel method where sample normalization is unnecessary, but important insights can be obtained nevertheless is proposed, and a moderated statistic can be derived in the same way as the one following from a hierarchical model for individual genes.

Comparison of normalization methods for the analysis of metagenomic gene abundance data

This study emphasizes the importance of selecting a suitable normalization methods in the analysis of data from shotgun metagenomics and demonstrates that improper methods may result in unacceptably high levels of false positives, which in turn may lead to incorrect or obfuscated biological interpretation.

Understanding sequencing data as compositions: an outlook and review

The principles of compositional data analysis (CoDA) are summarized, evidence is provided for why sequencing data are compositional, methods available for analyzing sequencingData are discussed, and future directions with regard to this field of study are highlighted.

Counts: an outstanding challenge for log-ratio analysis of compositional data in the molecular biosciences

Here it is shown that additive (quantization) variation comes from the discrete nature of count data itself, as well as (biological) variation in the system under study and (technical) variation from measurement and analysis processes.