Corpus ID: 232092617

Contrastive Explanations for Model Interpretability

@article{Jacovi2021ContrastiveEF,
  title={Contrastive Explanations for Model Interpretability},
  author={Alon Jacovi and Swabha Swayamdipta and Shauli Ravfogel and Yanai Elazar and Yejin Choi and Yoav Goldberg},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.01378}
}
Contrastive explanations clarify why an event occurred in contrast to another. They are more inherently intuitive to humans to both produce and comprehend. We propose a methodology to produce contrastive explanations for classification models by modifying the representation to disregard non-contrastive information, and modifying model behavior to only be based on contrastive reasoning. Our method is based on projecting model representation to a latent space that captures only the features that… Expand
ECINN: Efficient Counterfactuals from Invertible Neural Networks
TLDR
A method is proposed, ECINN, that utilizes the generative capacities of invertible neural networks for image classification to generate counterfactual examples efficiently and outperforms established methods that generate heatmap-based explanations. Expand
Explaining NLP Models via Minimal Contrastive Editing (MiCE)
TLDR
It is demonstrated how MICE edits can be used for two use cases in NLP system development—debugging incorrect model outputs and uncovering dataset artifacts—and thereby illustrate that producing contrastive explanations is a promising research direction for model interpretability. Expand
Explaining the Road Not Taken
TLDR
This paper summarizes the common forms of explanations used in over 200 recent papers about natural language processing (NLP), and compares them against user questions collected in the XAI Question Bank, and finds that most model interpretations cannot answer these questions. Expand
Learning with Instance Bundles for Reading Comprehension
TLDR
Drawing on ideas from contrastive estimation, several new supervision techniques are introduced that compare question-answer scores across multiple related instances, and normalize these scores across various neighborhoods of closely contrasting questions and/or answers. Expand

References

SHOWING 1-10 OF 67 REFERENCES
RoBERTa: A Robustly Optimized BERT Pretraining Approach
TLDR
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD. Expand
CausaLM: Causal Model Explanation Through Counterfactual Language Models
TLDR
CausaLM is proposed, a framework for producing causal model explanations using counterfactual language representation models based on fine-tuning of deep contextualized embedding models with auxiliary adversarial tasks derived from the causal graph of the problem. Expand
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
TLDR
There is substantial room for improvement in NLI systems, and the HANS dataset can motivate and measure progress in this area, which contains many examples where the heuristics fail. Expand
Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals
TLDR
The inability to infer behavioral conclusions from probing results is pointed out, and an alternative method that focuses on how the information is being used is offered, rather than on what information is encoded is offered. Expand
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
TLDR
This work presents Iterative Null-space Projection (INLP), a novel method for removing information from neural representations based on repeated training of linear classifiers that predict a certain property the authors aim to remove, followed by projection of the representations on their null-space. Expand
Annotation Artifacts in Natural Language Inference Data
TLDR
It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes. Expand
Aligning Faithful Interpretations with their Social Attribution
TLDR
It is found that the requirement of model interpretations to be faithful is vague and incomplete, and faithfulness is reformulated as an accurate attribution of causality to the model, and aligned faithfulness: faithful causal chains that are aligned with their expected social behavior is introduced. Expand
A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence
TLDR
This work conducts a systematic literature review which provides readers with a thorough and reproducible analysis of the interdisciplinary research field under study and defines a taxonomy regarding both theoretical and practical approaches to contrastive and counterfactual explanation. Expand
A survey of contrastive
  • 2021
Explaining NLP Models via Minimal Contrastive Editing (MiCE)
TLDR
It is demonstrated how MICE edits can be used for two use cases in NLP system development—debugging incorrect model outputs and uncovering dataset artifacts—and thereby illustrate that producing contrastive explanations is a promising research direction for model interpretability. Expand
...
1
2
3
4
5
...