Corpus ID: 204743896

On Concept-Based Explanations in Deep Neural Networks

@article{Yeh2019OnCE,
  title={On Concept-Based Explanations in Deep Neural Networks},
  author={Chih-Kuan Yeh and Been Kim and Sercan {\"O}. Arik and Chun-Liang Li and Pradeep Ravikumar and Tomas Pfister},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.07969}
}
Deep neural networks (DNNs) build high-level intelligence on low-level raw features. Understanding of this high-level intelligence can be enabled by deciphering the concepts they base their decisions on, as human-level thinking. In this paper, we study concept-based explainability for DNNs in a systematic framework. First, we define the notion of completeness, which quantifies how sufficient a particular set of concepts is in explaining a model's prediction behavior. Based on performance and… Expand
A Concept-based Abstraction-Aggregation Deep Neural Network for Interpretable Document Classification
TLDR
A corpus-level explanation approach is proposed, which aims to capture causal relationships between keywords and model predictions via learning importance of keywords for predicted labels across a training corpus based on attention weights. Expand
Corpus-level and Concept-based Explanations for Interpretable Document Classification
TLDR
A corpus-level explanation approach, which aims at capturing causal relationships between keywords and model predictions via learning the importance of keywords for predicted labels across a training corpus based on attention weights, and a concept-based explanation method that can automatically learn higher level concepts and their importance to model prediction tasks. Expand
Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey
TLDR
A taxonomy and categorizing the XAI techniques based on their scope of explanations, methodology behind the algorithms, and explanation level or usage which helps build trustworthy, interpretable, and self-explanatory deep learning models is proposed. Expand
Interpreting Deep Neural Networks through Prototype Factorization
TLDR
This work proposes ProtoFac, an explainable matrix factorization technique that decomposes the latent representations at any selected layer in a pre-trained DNN as a collection of weighted prototypes, which are a small number of exemplars extracted from the original data. Expand
Now You See Me (CME): Concept-based Model Extraction
TLDR
This work presents CME: a concept-based model extraction framework, used for analysing DNN models via concept- based extracted models, and demonstrates how CME can be used to analyse the concept information learned by a DNN model. Expand
Comprehensible Convolutional Neural Networks via Guided Concept Learning
TLDR
This work proposes a guided learning approach with an additional concept layer in a CNN- based architecture to learn the associations between visual features and word phrases and designs an objective function that optimizes both prediction accuracy and semantics of the learned feature representations. Expand
Adversarial TCAV - Robust and Effective Interpretation of Intermediate Layers in Neural Networks
TLDR
A simple and scalable modification that employs a Gram-Schmidt process to sample random noise from concepts and learn an average "concept classifier" is proposed that improves the robustness and effectiveness of concept activations. Expand
Towards Fully Interpretable Deep Neural Networks: Are We There Yet?
TLDR
This paper provides a review of existing methods to develop DNNs with intrinsic interpretability, with a focus on Convolutional Neural Networks (CNNs), and identifies gaps in current work and suggest potential research directions. Expand
CACTUS: Detecting and Resolving Conflicts in Objective Functions
TLDR
This paper prototyping a technique to visualize multi-objective objective functions either defined in a Jupyter notebook or defined using an interactive visual interface to help users interactively specify meaningful objective functions by resolving potential conflicts for a classification task and demonstrates the approach in a VA system. Expand
Debiasing Concept Bottleneck Models with Instrumental Variables
TLDR
The problem of the concepts being correlated with confounding information in the features is studied and a new causal prior graph for modeling the impacts of unobserved variables and a method to remove the impact of confounding information and noise using the instrumental variable techniques are proposed. Expand
...
1
2
...

References

SHOWING 1-10 OF 33 REFERENCES
Automating Interpretability: Discovering and Testing Visual Concepts Learned by Neural Networks
TLDR
DTCAV (Discovery TCAV) is introduced, a global concept-based interpretability method that can automatically discover concepts as image segments, along with each concept's estimated importance for a deep neural network's predictions, and it is validated that discovered concepts are as coherent to humans as hand-labeled concepts. Expand
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
TLDR
Concept Activation Vectors (CAVs) are introduced, which provide an interpretation of a neural net's internal state in terms of human-friendly concepts, and may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application. Expand
Explaining Explanations: An Overview of Interpretability of Machine Learning
There has recently been a surge of work in explanatory artificial intelligence (XAI). This research area tackles the important problem that complex machines and algorithms often cannot provideExpand
Towards better understanding of gradient-based attribution methods for Deep Neural Networks
TLDR
This work analyzes four gradient-based attribution methods and formally prove conditions of equivalence and approximation between them, and constructs a unified framework which enables a direct comparison, as well as an easier implementation. Expand
A Unified Approach to Interpreting Model Predictions
TLDR
A unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), which unifies six existing methods and presents new methods that show improved computational performance and/or better consistency with human intuition than previous approaches. Expand
Interpretable Basis Decomposition for Visual Explanation
TLDR
A new framework called Interpretable Basis Decomposition for providing visual explanations for classification networks is proposed, decomposing the neural activations of the input image into semantically interpretable components pre-trained from a large concept corpus. Expand
EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction
TLDR
A new self-interpretable model that performs output prediction and simultaneously provides an explanation in terms of the presence of particular concepts in the input, based on a low-dimensional binary representation of the input. Expand
Attention-Based Prototypical Learning Towards Interpretable, Confident and Robust Deep Neural Networks
We propose a new framework for prototypical learning that bases decision-making on few relevant examples that we call prototypes. Our framework utilizes an attention mechanism that relates theExpand
Evaluating the Visualization of What a Deep Neural Network Has Learned
TLDR
A general methodology based on region perturbation for evaluating ordered collections of pixels such as heatmaps and shows that the recently proposed layer-wise relevance propagation algorithm qualitatively and quantitatively provides a better explanation of what made a DNN arrive at a particular classification decision than the sensitivity-based approach or the deconvolution method. Expand
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
TLDR
LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction. Expand
...
1
2
3
4
...