CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines

  title={CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines},
  author={Arjun Reddy Akula and Shuai Wang and Song-Chun Zhu},
  booktitle={AAAI Conference on Artificial Intelligence},
We present CoCoX (short for Conceptual and Counterfactual Explanations), a model for explaining decisions made by a deep convolutional neural network (CNN). In Cognitive Psychology, the factors (or semantic-level features) that humans zoom in on when they imagine an alternative to a model prediction are often referred to as fault-lines. Motivated by this, our CoCoX model explains decisions made by a CNN using fault-lines. Specifically, given an input image I for which a CNN classification model… 

Figures and Tables from this paper

Meaningfully debugging model mistakes using conceptual counterfactual explanations

This paper proposes a systematic approach, conceptual counterfactual explanations (CCE), that explains why a classifier makes a mistake on a particular test sample(s) in terms of human-understandable concepts, and validates the approach on well-known pretrained models, showing that it explains the models’ mistakes meaningfully.

Counterfactual Explanations for Misclassified Images: How Human and Machine Explanations Differ

Counterfactual explanations have emerged as a popular solution for the eXplainable AI (XAI) problem of elucidating the predictions of black-box deep-learning systems due to their psychological

Counterfactual Explanations in Explainable AI: A Tutorial

This tutorial will introduce the cognitive concept and characteristics ofcounterfactual explanation, its computational form, mainstream methods, and various adaptation in terms of different explanation settings, and outline the potential applications of counterfactual explanations like data augmentation or conversation system.

Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review

A rubric is designed with desirable properties of counterfactual explanation algorithms and all currently proposed algorithms against that rubric are evaluated, providing easy comparison and comprehension of the advantages and disadvantages of different approaches.

Mind the Context: The Impact of Contextualization in Neural Module Networks for Grounding Visual Referring Expressions

This work substantially reduces the number of modules in NMN by up to 75% by parameterizing the module arguments, and proposes a method to contextualize the parameterized model to enhance the module’s capacity in exploiting the visiolinguistic associations.

Concept-based Explanations using Non-negative Concept Activation Vectors and Decision Tree for CNN Models

One of the state-of-the-art concept-based models is modified based on the requirements of accuracy, performance, and interpretability, which increases interpretability for Convolutional Neural Networks models and boosts the fidelity and performance of the used explainer.

Making Heads or Tails: Towards Semantically Consistent Visual Counterfactuals

This work presents a novel framework for computing visual counterfactual explanations based on two key ideas, enforcing that the replaced and replacer regions contain the same semantic part, resulting in more semantically consistent explanations.

ECINN: Efficient Counterfactuals from Invertible Neural Networks

This work proposes a method, ECINN, that utilizes the generative capacities of invertible neural networks for image classification to generate counterfactual examples e.g. in the time of only two evaluations, in contrast to competing methods that sometimes need a thousand evaluations or more.

Explainable image classification with evidence counterfactual

SEDC is introduced as a model-agnostic instance-level explanation method for image classification that does not need access to the training data and is benchmarked against existing model-gnostic explanation methods, demonstrating stability of results, computational efficiency and the counterfactual nature of the explanations.



Generating Counterfactual Explanations with Natural Language

This work considers a fine-grained image classification task and proposes an intuitive method to generate counterfactual explanations by inspecting which evidence in an input is missing, but might contribute to a different classification decision if present in the image.

Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives

A novel method that provides contrastive explanations justifying the classification of an input by a black box classifier such as a deep neural network is proposed and it is argued that such explanations are natural for humans and are used commonly in domains such as health care and criminology.

Generating Visual Explanations

A new model is proposed that focuses on the discriminating properties of the visible object, jointly predicts a class label, and explains why the predicted label is appropriate for the image, and generates sentences that realize a global sentence property, such as class specificity.

Natural Language Interaction with Explainable AI Models

This paper presents an explainable AI (XAI) system that provides explanations for its predictions, and identifies several correlations between user's questions and the XAI answers using Youtube Action dataset.

Counterfactual Visual Explanations

It is found that users trained to distinguish bird species fare better when given access to counterfactual explanations in addition to training examples, and the effectiveness of these explanations in teaching humans is explored.

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

This work proposes a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable, and shows that even non-attention based models learn to localize discriminative regions of input image.

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Concept Activation Vectors (CAVs) are introduced, which provide an interpretation of a neural net's internal state in terms of human-friendly concepts, and may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.

Explainable AI as Collaborative Task Solving

Compared to the most popularly used attribution based explanations (saliency maps), the new framework X-ToM significantly improves human trust in the underlying vision system and quantitatively evaluates human’s trust of machine behaviors.

“Why Should I Trust You?”: Explaining the Predictions of Any Classifier

LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.

Attention is not Explanation

This work performs extensive experiments across a variety of NLP tasks to assess the degree to which attention weights provide meaningful “explanations” for predictions, and finds that they largely do not.