Corpus ID: 220041787

Generative causal explanations of black-box classifiers

@article{OShaughnessy2020GenerativeCE,
  title={Generative causal explanations of black-box classifiers},
  author={Matthew R. O'Shaughnessy and Gregory Canal and Marissa Connor and M. Davenport and C. Rozell},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.13913}
}
We develop a method for generating causal post-hoc explanations of black-box classifiers based on a learned low-dimensional representation of the data. The explanation is causal in the sense that changing learned latent factors produces a change in the classifier output statistics. To construct these explanations, we design a learning framework that leverages a generative model and information-theoretic measures of causal influence. Our objective function encourages both the generative model to… Expand
Instance-wise Causal Feature Selection for Model Interpretation
TLDR
A causal extension to the recently introduced paradigm of instance-wise feature selection to explain black-box visual classifiers is formulated and the efficacy of this approach is shown on multiple vision datasets by measuring the post-hoc accuracy and Average Causal Effect of selected features on the model’s output. Expand
Concept Bottleneck Models
TLDR
On x-ray grading and bird identification, concept bottleneck models achieve competitive accuracy with standard end-to-end models, while enabling interpretation in terms of high-level clinical concepts (“bone spurs”) or bird attributes ( “wing color”). Expand
Unsupervised Causal Binary Concepts Discovery with VAE for Black-box Model Explanation
We aim to explain a black-box classifier with the form: ‘data X is classified as class Y because X has A, B and does not have C’ in which A, B, and C are high-level concepts. The challenge is that weExpand
CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms
TLDR
CARLA (Counterfactual And Recourse LibrAry), a python library for benchmarking counterfactual explanation methods across both different data sets and different machine learning models, and a standardized set of integrated evaluation measures and data sets for transparent and extensive comparisons. Expand
Explainable Reinforcement Learning for Broad-XAI: A Conceptual Framework and Survey
Broad Explainable Artificial Intelligence (Broad-XAI) moves away from interpreting individual decisions based on a single datum and aims to provide integrated explanations from multiple machineExpand
Explaining in Style: Training a GAN to explain a classifier in StyleSpace
TLDR
StylEx is presented, a method for training a generative model to specifically explain multiple attributes that underlie classifier decisions, by training a StyleGAN, which incorporates the classifier model, in order to learn a classifier-specific StyleSpace. Expand
Information-Theoretic Methods in Deep Neural Networks: Recent Advances and Emerging Opportunities
We present a review on the recent advances and emerging opportunities around the theme of analyzing deep neural networks (DNNs) with information-theoretic methods. We first discuss popularExpand
Replication Study of “Generative Causal Explanations of Black-Box classifiers”
  • 2021
We verify the outcome of the methodology proposed in the article, which attempts to provide post-hoc causal explanations for black-box classifiers through causal reference. This is achieved byExpand
Reproducibility report Generative causal explanations of black-box classifiers
  • 2021
The paper by O’Shaughnessy et al. (2020) claims to have developed a method to disentangle the latent space of 3 generative models during training. The latent space then consists of variables withExpand
VAE-CE: Visual Contrastive Explanation using Disentangled VAEs
The goal of a classification model is to assign the correct labels to data. In most cases, this data is not fully described by the given set of labels. Often a rich set of meaningful concepts existExpand
...
1
2
...

References

SHOWING 1-10 OF 88 REFERENCES
Explaining Classifiers with Causal Concept Effect (CaCE)
TLDR
This work defines the Causal Concept Effect (CaCE) as the causal effect of a human-interpretable concept on a deep neural net's predictions, and shows that the CaCE measure can avoid errors stemming from confounding. Expand
Interpretable & Explorable Approximations of Black Box Models
TLDR
This is the first approach which can produce global explanations of the behavior of any given black box model through joint optimization of unambiguity, fidelity, and interpretability, while also allowing users to explore model behavior based on their preferences. Expand
Interpreting Black Box Predictions using Fisher Kernels
TLDR
This work takes a novel look at black box interpretation of test predictions in terms of training examples, making use of Fisher kernels as the defining feature embedding of each data point, combined with Sequential Bayesian Quadrature (SBQ) for efficient selection of examples. Expand
Causal Interpretability for Machine Learning - Problems, Methods and Evaluation
TLDR
This work presents a comprehensive survey on causal interpretable models from the aspects of the problems and methods and provides in-depth insights into the existing evaluation metrics for measuring interpretability, which can help practitioners understand for what scenarios each evaluation metric is suitable. Expand
Understanding Black-box Predictions via Influence Functions
TLDR
This paper uses influence functions — a classic technique from robust statistics — to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. Expand
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
TLDR
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction. Expand
Explaining machine learning classifiers through diverse counterfactual explanations
TLDR
This work proposes a framework for generating and evaluating a diverse set of counterfactual explanations based on determinantal point processes, and provides metrics that enable comparison ofcounterfactual-based methods to other local explanation methods. Expand
A causal framework for explaining the predictions of black-box sequence-to-sequence models
TLDR
This work interprets the predictions of any black-box structured input-structured output model around a specific input-output pair, adopting a variational autoencoder to yield meaningful input perturbations. Expand
Learning Interpretable Models with Causal Guarantees
TLDR
This work proposes a framework for learning causal interpretable models---from observational data---that can be used to predict individual treatment effects and proves an error bound on the treatment effects predicted by the model. Expand
Explaining Deep Learning Models - A Bayesian Non-parametric Approach
TLDR
The empirical results indicate that the proposed approach not only outperforms the state-of-the-art techniques in explaining individual decisions but also provides users with an ability to discover the vulnerabilities of the target ML models. Expand
...
1
2
3
4
5
...