Corpus ID: 219179771

DeepCoDA: personalized interpretability for compositional health data

  title={DeepCoDA: personalized interpretability for compositional health data},
  author={T. Quinn and Dang Nguyen and Santu Rana and Sunil Gupta and Svetha Venkatesh},
Interpretability allows the domain-expert to directly evaluate the model's relevance and reliability, a practice that offers assurance and builds trust. In the healthcare setting, interpretable models should implicate relevant biological mechanisms independent of technical factors like data pre-processing. We define personalized interpretability as a measure of sample-specific feature attribution, and view it as a minimum requirement for a precision health model to justify its conclusions. Some… Expand
A Field Guide to Scientific XAI: Transparent and Interpretable Deep Learning for Bioinformatics Research
This field guide will help researchers more effectively design transparently interpretable models, and thus enable them to use deep learning for scientific discovery. Expand
Learning Sparse Log-Ratios for High-Throughput Sequencing Data
This work presents CoDaCoRe, a novel learning algorithm that identifies sparse, interpretable, and predictive log-ratio biomarkers from HTS data by exploiting a continuous relaxation to approximate the underlying combinatorial optimization problem. Expand
A causal view on compositional data
This work provides a causal view on compositional data in an instrumental variable setting where the composition acts as the cause and advocates for multivariate alternatives using statistical data transformations and regression techniques that take the special structure of the compositional sample space into account. Expand
Swag: A Wrapper Method for Sparse Learning
This work proposes to study a procedure that combines screening and wrapper methods and aims to find a library of extremely low-dimensional attribute combinations to match or improve the predictive performance of any particular learning method which uses all attributes as an input (including sparse learners). Expand
Interpretability and Explainability: A Machine Learning Zoo Mini-tour
This review examines the problem of designing interpretable and explainable machine learning models and emphasises the divide between interpretability and explainability and illustrates these two different research directions with concrete examples of the state-of-the-art. Expand


Interpretable Log Contrasts for the Classification of Health Biomarkers: a New Approach to Balance Selection
The relative nature of health biomarkers is emphasized, the literature surrounding the classification of relative data is discussed, and how different transformations perform for regularized logistic regression across multiple biomarker types is benchmarked. Expand
DeepTRIAGE: Interpretable and Individualised Biomarker Scores using Attention Mechanism for the Classification of Breast Cancer Sub-types
This paper proposes a novel deep learning architecture, called DeepTRIAGE (Deep learning for the TRactable Individualised Analysis of Gene Expression), which not only classifies cancer sub-types with good accuracy, but simultaneously assigns each patient their own set of interpretable and individualised biomarker scores. Expand
DeepTRIAGE: interpretable and individualised biomarker scores using attention mechanism for the classification of breast cancer sub-types
A novel deep learning architecture, called DeepTRIAGE (Deep learning for the TRactable Individualised Analysis of Gene Expression), which uses an attention mechanism to obtain personalised biomarker scores that describe how important each gene is in predicting the cancer sub-type for each sample. Expand
Using balances to engineer features for the classification of health biomarkers: a new approach to balance selection
The relative nature of health biomarkers is emphasized, how one could use balances to engineer features prior to classification is explored, and a simple procedure is proposed to select discriminative 2- and 3-part balances. Expand
Representation Learning of Compositional Data
This work focuses on principal component analysis (PCA) and proposes an approach that allows low dimensional representation learning directly from the original data, and includes a convenient surrogate (upper bound) loss of the exponential family PCA which has an easy to optimize form. Expand
Towards Robust Interpretability with Self-Explaining Neural Networks
This work designs self-explaining models in stages, progressively generalizing linear classifiers to complex yet architecturally explicit models, and proposes three desiderata for explanations in general – explicitness, faithfulness, and stability. Expand
Deep in the Bowel: Highly Interpretable Neural Encoder-Decoder Networks Predict Gut Metabolites from Gut Microbiome
This work proposes a sparse neural encoder-decoder network to predict metabolite abundances from microbe abundances using paired data from a cohort of inflammatory bowel disease (IBD) patients and shows that the model outperforms linear univariate and multivariate methods in terms of accuracy, sparsity, and stability. Expand
Understanding sequencing data as compositions: an outlook and review
The principles of compositional data analysis (CoDA) are summarized, evidence is provided for why sequencing data are compositional, methods available for analyzing sequencingData are discussed, and future directions with regard to this field of study are highlighted. Expand
Statistical Analysis of Metagenomics Data
  • M. L. Calle
  • Computer Science, Medicine
  • Genomics & informatics
  • 2019
In this review, some of the procedures that are most commonly used for microbiome analysis and that are implemented in R packages are outlined and the principles of compositional data analysis are described. Expand
Variable selection in microbiome compositional data analysis
A reproducible vignette is provided for the application of selbal, a forward selection approach for the identification of compositional balances, and clr- lasso and coda-lasso, two penalized regression models for compositional data analysis, to enable researchers to fully leverage their potential in microbiome studies. Expand