Contrastive Explanations for Model Interpretability

  title={Contrastive Explanations for Model Interpretability},
  author={Alon Jacovi and Swabha Swayamdipta and Shauli Ravfogel and Yanai Elazar and Yejin Choi and Yoav Goldberg},
Contrastive explanations clarify why an event occurred in contrast to another. They are more inherently intuitive to humans to both produce and comprehend. We propose a methodology to produce contrastive explanations for classification models by modifying the representation to disregard non-contrastive information, and modifying model behavior to only be based on contrastive reasoning. Our method is based on projecting model representation to a latent space that captures only the features that… 
Explainability in supply chain operational risk management: A systematic literature review
Risk managers are encouraged to choose techniques for supply chain operational risk management that can be auditable as this will ensure that the risk managers know why they should take a particular risk management action rather than just what they should do to manage the operational risks.
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction
AlterRep, an intervention-based method, is applied to study how BERT models of different sizes process relative clauses (RCs) and finds that BERT variants use RC boundary information during word prediction in a manner that is consistent with the rules of English grammar.
ECINN: Efficient Counterfactuals from Invertible Neural Networks
A method is proposed, ECINN, that utilizes the generative capacities of invertible neural networks for image classification to generate counterfactual examples efficiently and outperforms established methods that generate heatmap-based explanations.
Explaining NLP Models via Minimal Contrastive Editing (MiCE)
It is demonstrated how MICE edits can be used for two use cases in NLP system development—debugging incorrect model outputs and uncovering dataset artifacts—and thereby illustrate that producing contrastive explanations is a promising research direction for model interpretability.
Explaining the Road Not Taken
This paper summarizes the common forms of explanations used in over 200 recent papers about natural language processing (NLP), and compares them against user questions collected in the XAI Question Bank, and finds that most model interpretations cannot answer these questions.
Generating High-Quality Explanations for Navigation in Partially-Revealed Environments
We present an approach for generating natural language explanations of high-level behavior of autonomous agents navigating in partially-revealed environments. Our counterfactual explanations
Interpreting Deep Learning Models in Natural Language Processing: A Review
In this survey, a comprehensive review of various interpretation methods for neural models in NLP is provided, including a high-level taxonomy for interpretation methods in N LP, and points out deficiencies of current methods and suggest some avenues for future research.
Learning with Instance Bundles for Reading Comprehension
Drawing on ideas from contrastive estimation, several new supervision techniques are introduced that compare question-answer scores across multiple related instances, and normalize these scores across various neighborhoods of closely contrasting questions and/or answers.


RoBERTa: A Robustly Optimized BERT Pretraining Approach
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
CausaLM: Causal Model Explanation Through Counterfactual Language Models
CausaLM is proposed, a framework for producing causal model explanations using counterfactual language representation models based on fine-tuning of deep contextualized embedding models with auxiliary adversarial tasks derived from the causal graph of the problem.
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
There is substantial room for improvement in NLI systems, and the HANS dataset can motivate and measure progress in this area, which contains many examples where the heuristics fail.
Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals
The inability to infer behavioral conclusions from probing results is pointed out, and an alternative method that focuses on how the information is being used is offered, rather than on what information is encoded is offered.
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
This work presents Iterative Null-space Projection (INLP), a novel method for removing information from neural representations based on repeated training of linear classifiers that predict a certain property the authors aim to remove, followed by projection of the representations on their null-space.
Annotation Artifacts in Natural Language Inference Data
It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.
Aligning Faithful Interpretations with their Social Attribution
It is found that the requirement of model interpretations to be faithful is vague and incomplete, and faithfulness is reformulated as an accurate attribution of causality to the model, and aligned faithfulness: faithful causal chains that are aligned with their expected social behavior is introduced.
A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence
This work conducts a systematic literature review which provides readers with a thorough and reproducible analysis of the interdisciplinary research field under study and defines a taxonomy regarding both theoretical and practical approaches to contrastive and counterfactual explanation.
A survey of contrastive
  • 2021
Contrastive explanation: a structural-model approach
  • Tim Miller
  • Computer Science
    The Knowledge Engineering Review
  • 2021
This model can help researchers in subfields of artificial intelligence to better understand contrastive explanation and is demonstrated on two classical problems in artificial intelligence: classification and planning.