Do Human Rationales Improve Machine Explanations?

@inproceedings{Strout2019DoHR,
  title={Do Human Rationales Improve Machine Explanations?},
  author={Julia Strout and Ye Zhang and Raymond J. Mooney},
  booktitle={BlackboxNLP@ACL},
  year={2019}
}
Work on “learning with rationales” shows that humans providing explanations to a machine learning system can improve the system’s predictive accuracy. However, this work has not been connected to work in “explainable AI” which concerns machines explaining their reasoning to humans. In this work, we show that learning with rationales can also improve the quality of the machine’s explanations as evaluated by human judges. Specifically, we present experiments showing that, for CNN-based text… 

Figures and Tables from this paper

Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing

This review identifies 61 datasets with three predominant classes of textual expla6 nations (highlights, free-text, and structured), organize the literature on annotating each type, identify strengths and shortcomings of existing collection methodologies, and give recommendations for collecting EXNLP datasets in the future.

Teach Me to Explain: A Review of Datasets for Explainable NLP

This review identifies three predominant classes of explanations (highlights, free-text, and structured), organize the literature on annotating each type, point to what has been learned to date, and give recommendations for collecting EXNLP datasets in the future.

Constructing Natural Language Explanations via Saliency Map Verbalization

The results suggest that saliency map verbalization makes explanations more under-standable and less cognitively challenging to humans than conventional heatmap visualization.

Explain and Predict, and then Predict Again

This work proposes a novel yet simple approach ExPred, which uses multi-task learning in the explanation generation phase effectively trading-off explanation and prediction losses and finds that it substantially outperform existing approaches.

ERASER: A Benchmark to Evaluate Rationalized NLP Models

This work proposes the Evaluating Rationales And Simple English Reasoning (ERASER) a benchmark to advance research on interpretable models in NLP, and proposes several metrics that aim to capture how well the rationales provided by models align with human rationales, and also how faithful these rationales are.

Why do you think that? Exploring faithful sentence–level rationales without supervision

This work proposes a differentiable training–framework to create models which output faithful rationales on a sentence level, by solely applying supervision on the target task, and exploits the transparent decision–making process of these models to prefer selecting the correct rationales by applying direct supervision, thereby boosting the performance on the rationale–level.

Rationalization for Explainable NLP: A Survey

This survey presents available methods, explainable evaluations, code, and datasets used across various NLP tasks that use rationalization, and a new subfield in Explainable AI (XAI), namely, Rational AI (RAI), is introduced to advance the current state of rationalization.

An Investigation of Language Model Interpretability via Sentence Editing

A sentence editing dataset is re-purpose, where faithful high-quality human rationales can be automatically extracted and compared with extracted model rationales as a new testbed for interpretability, to conduct a systematic investigation on PLMs’ interpretability.

Learning to Faithfully Rationalize by Construction

Variations of this simple framework yield predictive performance superior to ‘end-to-end’ approaches, while being more general and easier to train.

UNIREX: A Unified Learning Framework for Language Model Rationale Extraction

UNIREX, a flexible learning framework which generalizes rationale extractor optimization as follows, and introduces the Normalized Relative Gain (NRG) metric, which finds that UNIREX-trained rationale extractors’ faithfulness can even generalize to unseen datasets and tasks.

References

SHOWING 1-10 OF 24 REFERENCES

Using “Annotator Rationales” to Improve Machine Learning for Text Categorization

It is hypothesize that in some situations, providing rationales is a more fruitful use of an annotator's time than annotating more examples, and presents a learning method that exploits the rationales during training to boost performance significantly on a sample task, namely sentiment classification of movie reviews.

Deriving Machine Attention from Human Rationales

It is demonstrated that even in the low-resource scenario, attention can be learned effectively, and this approach delivers significant gains over state-of-the-art baselines, yielding over 15% average error reduction on benchmark datasets.

Rationalizing Neural Predictions

The approach combines two modular components, generator and encoder, which are trained to operate well together and specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction.

Rationale-Augmented Convolutional Neural Networks for Text Classification

A sentence-level convolutional model is proposed that estimates the probability that a given sentence is a rationale, and the contribution of each sentence to the aggregate document representation in proportion to these estimates.

Grounding Visual Explanations

A phrase-critic model to refine generated candidate explanations augmented with flipped phrases to improve the textual explanation quality of fine-grained classification decisions on the CUB dataset by mentioning phrases that are grounded in the image.

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

It is quantitatively shown that training with the textual explanations not only yields better textual justification models, but also better localizes the evidence that supports the decision, supporting the thesis that multimodal explanation models offer significant benefits over unimodal approaches.

Towards A Rigorous Science of Interpretable Machine Learning

This position paper defines interpretability and describes when interpretability is needed (and when it is not), and suggests a taxonomy for rigorous evaluation and exposes open questions towards a more rigorous science of interpretable machine learning.

Teaching Machines to Read and Comprehend

A new methodology is defined that resolves this bottleneck and provides large scale supervised reading comprehension data that allows a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure to be developed.

Comparing Automatic and Human Evaluation of Local Explanations for Text Classification

A variety of local explanation approaches using automatic measures based on word deletion are evaluated, showing that an evaluation using a crowdsourcing experiment correlates moderately with these automatic measures and that a variety of other factors also impact the human judgements.

Reasoning about Entailment with Neural Attention

This paper proposes a neural model that reads two sentences to determine entailment using long short-term memory units and extends this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases, and presents a qualitative analysis of attention weights produced by this model.