AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models

  title={AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models},
  author={Eric Wallace and Jens Tuyls and Junlin Wang and Sanjay Subramanian and Matt Gardner and Sameer Singh},
Neural NLP models are increasingly accurate but are imperfect and opaque---they break in counterintuitive ways and leave end users puzzled at their behavior. [] Key Method The toolkit provides interpretation primitives (e.g., input gradients) for any AllenNLP model and task, a suite of built-in interpretation methods, and a library of front-end visualization components. We demonstrate the toolkit's flexibility and utility by implementing live demos for five interpretation methods (e.g., saliency maps and…

Figures from this paper

Interpreting Predictions of NLP Models

This tutorial will provide a background on interpretation techniques, i.e., methods for explaining the predictions of NLP models, and present a thorough study of example-specific interpretations, including saliency maps, input perturbations, and influence functions.

Towards Faithful Model Explanation in NLP: A Survey

This survey first discusses the definition and evaluation of Faithfulness, as well as its significance for explainability, and introduces the recent advances in faithful explanation by grouping approaches into five categories: similarity methods, analysis of model-internal structures, backpropagation-based methods, counterfactual intervention, and self-explanatory models.

Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions

It is found that influence functions are particularly useful for natural language inference, a task in which ‘saliency maps’ may not have clear interpretation, and a new quantitative measure based on influence functions that can reveal artifacts in training data is developed.

Interpreting Deep Learning Models in Natural Language Processing: A Review

In this survey, a comprehensive review of various interpretation methods for neural models in NLP is provided, including a high-level taxonomy for interpretation methods in N LP, and points out deficiencies of current methods and suggest some avenues for future research.

UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA

SQuARE v2, the new version of SQuARE, is introduced to provide an explainability infrastructure for comparing models based on methods such as saliency maps and graph-based explanations, and multiple adversarial attacks to compare the robustness of QA models are provided.

InterpreT: An Interactive Visualization Tool for Interpreting Transformers

InterpreT is an interactive visualization tool for interpreting Transformer-based models, and its functionalities are demonstrated through the analysis of model behaviours for two disparate tasks: Aspect Based Sentiment Analysis and the Winograd Schema Challenge.

SQuAD2-CR: Semi-supervised Annotation for Cause and Rationales for Unanswerability in SQuAD 2.0

SQuAD2-CR dataset is released, which contains annotations on unanswerable questions from the SQuAD 2.0 dataset, to enable an explanatory analysis of the model prediction and annotate explanation on why the most plausible answer span cannot be the answer and which part of the question causes unanswerability.

Measuring Association Between Labels and Free-Text Rationales

It is demonstrated that *pipelines*, models for faithful rationalization on information-extraction style tasks, do not work as well on “reasoning” tasks requiring free-text rationales, and state-of-the-art T5-based joint models exhibit desirable properties for explaining commonsense question-answering and natural language inference.

Gradient-based Analysis of NLP Models is Manipulable

This paper merge the layers of a target model with a Facade Model that overwhelms the gradients without affecting the predictions, and shows that the merged model effectively fools different analysis tools.

The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

The Language Interpretability Tool (LIT), an open-source platform for visualization and understanding of NLP models, is presented, which integrates local explanations, aggregate analysis, and counterfactual generation into a streamlined, browser-based interface to enable rapid exploration and error analysis.



Pathologies of Neural Models Make Interpretations Difficult

This work uses input reduction, which iteratively removes the least important word from the input, to expose pathological behaviors of neural models: the remaining words appear nonsensical to humans and are not the ones determined as important by interpretation methods.

"Why Should I Trust You?": Explaining the Predictions of Any Classifier

LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.

Annotation Artifacts in Natural Language Inference Data

It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.

Axiomatic Attribution for Deep Networks

We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms— Sensitivity and

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Visual Interrogation of Attention-Based Models for Natural Language Inference and Machine Comprehension

A flexible visualization library for creating customized visual analytic environments, in which the user can investigate and interrogate the relationships among the input, the model internals, and the output predictions, which in turn shed light on the model decision-making process.

Visualizing and Understanding Recurrent Networks

This work uses character-level language models as an interpretable testbed to provide an analysis of LSTM representations, predictions and error types, and reveals the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets.

DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs

A new reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs, and presents a new model that combines reading comprehension methods with simple numerical reasoning to achieve 51% F1.

AllenNLP: A Deep Semantic Natural Language Processing Platform

AllenNLP is described, a library for applying deep learning methods to NLP research that addresses issues with easy-to-use command-line tools, declarative configuration-driven experiments, and modular NLP abstractions.

QADiver: Interactive Framework for Diagnosing QA Models

A web-based UI that provides how each model contributes to QA performances, by integrating visualization and analysis tools for model explanation is proposed and it is expected this framework can help QA model researchers to refine and improve their models.