Evaluating Explanations: How Much Do Explanations from the Teacher Aid Students?

  title={Evaluating Explanations: How Much Do Explanations from the Teacher Aid Students?},
  author={Danish Pruthi and Bhuwan Dhingra and Livio Baldini Soares and Michael Collins and Zachary Chase Lipton and Graham Neubig and William W. Cohen},
  journal={Transactions of the Association for Computational Linguistics},
While many methods purport to explain predictions by highlighting salient features, what aims these explanations serve and how they ought to be evaluated often go unstated. In this work, we introduce a framework to quantify the value of explanations via the accuracy gains that they confer on a student model trained to simulate a teacher model. Crucially, the explanations are available to the student during training, but are not available at test time. Compared with prior proposals, our approach… 

Learning to Scaffold: Optimizing Model Explanations for Teaching

This work trains models on three natural language processing and computer vision tasks, and finds that students trained with explanations extracted with this framework are able to simulate the teacher more effectively than ones produced with previous methods.

When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data

This paper provides a formal framework for characterizing approaches to learning from explanation data, and proposes a synthetic task for studying how models learn from explanationData, and gives graphical models for the available modeling approaches.

Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations

A crowdsourcing study where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews, it is observed that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.

How (Not) To Evaluate Explanation Quality

This paper substantiates theoretical claims (i.e., the lack of validity and temporal decline of currently-used proxy scores) with empirical evidence from a crowdsourcing case study in which it investigates the explanation quality of state-of-the-art explainable question answering systems.

Diagnostics-Guided Explanation Generation

This work shows how to directly optimise for Faithfulness and Confidence Indication when training a model to generate sentence-level explanations, which markedly improves explanation quality, agreement with human rationales, and downstream task performance on three complex reasoning tasks.

What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods

The results demonstrate that the degree to which individual attribution methods help human participants better understand an AI system varies widely across these scenarios, suggesting the need to move beyond quantitative improvements of current attribution methods, towards the development of complementary approaches that provide qualitatively different sources of information to human end-users.

ExSum: From Local Explanations to Model Understanding

ExSum is introduced, a mathematical framework for quantifying model understanding, and metrics for its quality assessment are proposed, which connect understandability to other properties of explanations such as human alignment, robustness, and counterfactual similarity and plausibility.

Use-Case-Grounded Simulations for Explanation Evaluation

Evidence is provided that SimEvals can be used to screen an important set of user study design decisions, e.g. selecting which explanations should be presented to the user, before running a potentially costly user study.

Simulated User Studies for Explanation Evaluation

This work provides a two-phase framework to conduct simulated user evaluations and demonstrates that, by instantiating this framework for local explanations, it can be used to recreate findings from existing user studies for two use cases (identifying data bugs and performing forward simulation).

A survey on improving NLP models with human explanations

An overview of different methods for learning from human explanations is given, and different factors that can inform the decision of which method to choose for a specific use-case are discussed.



Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?

Human subject tests are carried out that are the first of their kind to isolate the effect of algorithmic explanations on a key aspect of model interpretability, simulatability, while avoiding important confounding experimental factors.

Do explanations make VQA models more predictable to a human?

This work analyzes if existing explanations indeed make a VQA model — its responses as well as failures — more predictable to a human, and finds that they do not, and that human-in-the-loop approaches that treat the model as a black-box do.

Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?

A leakage-adjusted simulatability (LAS) metric is introduced for evaluating NL explanations, which measures how well explanations help an observer predict a model’s output, while controlling for how explanations can directly leak the output.

QED: A Framework and Dataset for Explanations in Question Answering

A large user study is described showing that the presence of QED explanations significantly improves the ability of untrained raters to spot errors made by a strong neural QA baseline.

Reading Tea Leaves: How Humans Interpret Topic Models

New quantitative methods for measuring semantic meaning in inferred topics are presented, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood.

“Why Should I Trust You?”: Explaining the Predictions of Any Classifier

LIME is proposed, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning aninterpretable model locally varound the prediction.

Do Context-Aware Translation Models Pay the Right Attention?

This paper introduces SCAT (Supporting Context for Ambiguous Translations), a new English-French dataset comprising supporting context words for 14K translations that professional translators found useful for pronoun disambiguation and performs an in-depth analysis of the context used todisambiguate.

Modeling Annotators: A Generative Approach to Learning from Annotator Rationales

A generative model of how a given annotator, knowing the true θ, stochastically chooses rationales is presented, and observing the rationales helps us infer thetrue θ.

The Explanation Game: Towards Prediction Explainability through Sparse Communication

This work provides a unified perspective of explainability as a communication problem between an explainer and a layperson about a classifier’s decision, and compares several explainers, including gradient methods, erasure, and attention mechanisms, in terms of their communication success.

Towards Prediction Explainability through Sparse Communication

This work provides a unified perspective of explainability as a communication problem between an explainer and a layperson about a classifier's decision, and uses this framework to compare several prior approaches for extracting explanations, including gradient methods, representation erasure, and attention mechanisms.