• Corpus ID: 49432754

Generating Counterfactual Explanations with Natural Language

  title={Generating Counterfactual Explanations with Natural Language},
  author={Lisa Anne Hendricks and Ronghang Hu and Trevor Darrell and Zeynep Akata},
Natural language explanations of deep neural network decisions provide an intuitive way for a AI agent to articulate a reasoning process. [] Key Method To demonstrate our method we consider a fine-grained image classification task in which we take as input an image and a counterfactual class and output text which explains why the image does not belong to a counterfactual class. We then analyze our generated counterfactual explanations both qualitatively and quantitatively using proposed automatic metrics.

Figures and Tables from this paper

Generating Natural Counterfactual Visual Explanations
This research uses the prediction results of the model on counterfactual images to find the attributes that have the greatest effect when the model is predicting classes A and B and applies this method to a fine-grained image classification dataset and used the generative adversarial network to generate naturalcounterfactual visual explanations.
KACE: Generating Knowledge Aware Contrastive Explanations for Natural Language Inference
This paper focuses on generating contrastive explanations with counterfactual examples in NLI and proposes a novel Knowledge-Aware Contrastive Explanation generation framework (KACE), which is shown to be able to better distinguish confusing candidates and improve data efficiency in other research fields.
CoCoX: Generating Conceptual and Counterfactual Explanations via Fault-Lines
It is argued that, due to the conceptual and counterfactual nature of fault-lines, the CoCoX explanations are practical and more natural for both expert and non-expert users to understand the internal workings of complex deep learning models.
Contrastive Explanations for Model Interpretability
The ability of label-contrastive explanations to provide fine-grained interpretability of model decisions is demonstrated, via both high-level abstract concept attribution and low-level input token/span attribution for two NLP classification benchmarks.
Towards Relatable Explainable AI with the Perceptual Process
This work found that counterfactual explanations were useful and further enhanced with semantic cues, but not saliency explanations, and proposed the XAI Perceptual Processing Framework and RexNet model for relatable explainable AI with Contrastive Saliency, Counterfactual Synthetic, and Contrastive Cues explanations.
Counterfactual attribute-based visual explanations for classification
The attribute-based explanations method is verified both quantitatively and qualitatively and it is shown that attributes provide discriminating and human understandable explanations for both standard as well as robust networks.
SCOUT: Self-Aware Discriminant Counterfactual Explanations
  • Pei Wang, N. Vasconcelos
  • Computer Science
    2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
It is argued that self-awareness, namely the ability to produce classification confidence scores, is important for the computation of discriminant explanations, which seek to identify regions where it is easy to discriminate between prediction and counter class.
NILE : Natural Language Inference with Faithful Natural Language Explanations
This work proposes Natural-language Inference over Label-specific Explanations (NILE), a novel NLI method which utilizes auto-generated label-specific NL explanations to produce labels along with its faithful explanation and demonstrates NILE’s effectiveness over previously reported methods through automated and human evaluation of the produced labels and explanations.
COIN: Counterfactual Image Generation for Visual Question Answering Interpretation
An interpretability approach for VQA models by generating counterfactual images so that the generated image is supposed to have the minimal possible change to the original image and leads the V QA model to give a different answer.


Grounding Visual Explanations
A phrase-critic model to refine generated candidate explanations augmented with flipped phrases to improve the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image.
Generating Visual Explanations
A new model is proposed that focuses on the discriminating properties of the visible object, jointly predicts a class label, and explains why the predicted label is appropriate for the image, and generates sentences that realize a global sentence property, such as class specificity.
Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives
A novel method that provides contrastive explanations justifying the classification of an input by a black box classifier such as a deep neural network is proposed and it is argued that such explanations are natural for humans and are used commonly in domains such as health care and criminology.
Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
It is quantitatively shown that training with the textual explanations not only yields better textual justification models, but also better localizes the evidence that supports the decision, supporting the thesis that multimodal explanation models offer significant benefits over unimodal approaches.
InterpNET: Neural Introspection for Interpretable Deep Learning
A new way to design interpretable neural networks for classification, inspired by physiological evidence of the human visual system's inner-workings is introduced, termed InterpNET, which can be combined with any existing classification architecture to generate natural language explanations of the classifications.
Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR
It is suggested data controllers should offer a particular type of explanation, unconditional counterfactual explanations, to support these three aims, which describe the smallest change to the world that can be made to obtain a desirable outcome, or to arrive at the closest possible world, without needing to explain the internal logic of the system.
Learning Deep Representations of Fine-Grained Visual Descriptions
This model achieves strong performance on zero-shot text-based image retrieval and significantly outperforms the attribute-based state-of-the-art for zero- shot classification on the Caltech-UCSD Birds 200-2011 dataset.
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
The Visual Genome dataset is presented, which contains over 108K images where each image has an average of $$35$$35 objects, $$26$$26 attributes, and $$21$$21 pairwise relationships between objects, and represents the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answer pairs.
Modeling Relationships in Referential Expressions with Compositional Modular Networks
This paper presents a modular deep architecture capable of analyzing referential expressions into their component parts, identifying entities and relationships mentioned in the input expression and grounding them all in the scene.
Adversarial Perturbations Against Deep Neural Networks for Malware Classification
This paper shows how to construct highly-effective adversarial sample crafting attacks for neural networks used as malware classifiers, and evaluates to which extent potential defensive mechanisms against adversarial crafting can be leveraged to the setting of malware classification.