Is Attention Interpretable?

@article{Serrano2019IsAI,
  title={Is Attention Interpretable?},
  author={Sofia Serrano and Noah A. Smith},
  journal={ArXiv},
  year={2019},
  volume={abs/1906.03731}
}
Attention mechanisms have recently boosted performance on a range of NLP tasks. [] Key Result We conclude that while attention noisily predicts input components’ overall importance to a model, it is by no means a fail-safe indicator.1
Understanding Attention for Text Classification
TLDR
A study on understanding the internal mechanism of attention by looking into the gradient update process, and proposing to analyze for each word token the following two quantities: its polarity score and its attention score, where the latter is a global assessment on the token's significance.
WHY IS ATTENTION NOT SO INTERPRETABLE ?
TLDR
The proposed methods can effectively improve the interpretability of attention mechanisms on a variety of datasets and are proposed to mitigate the issue of combinatorial shortcuts in attention weights.
Why Attentions May Not Be Interpretable?
TLDR
It is demonstrated that one root cause of this phenomenon is the combinatorial shortcuts, which means that, in addition to the highlighted parts, the attention weights themselves may carry extra information that could be utilized by downstream models after attention layers.
Why is Attention Not So Interpretable
TLDR
Theoretically analyze the combinatorial shortcuts, design one intuitive experiment to demonstrate their existence, and propose two methods to mitigate this issue, which show that the proposed methods can effectively improve the interpretability of attention mechanisms on a variety of datasets.
Attention Interpretability Across NLP Tasks
TLDR
This work attempts to fill the gap by giving a comprehensive explanation which justifies both kinds of observations (i.e., when is attention interpretable and when it is not) and reinforces the claim of interpretability of attention through manual evaluation.
Why is Attention Not So Attentive?
TLDR
It is revealed that one root cause of this phenomenon can be ascribed to the combinatorial shortcuts, which stand for that the models may not only obtain information from the highlighted parts by attention mechanisms but from the attention weights themselves.
Improving the Faithfulness of Attention-based Explanations with Task-specific Information for Text Classification
TLDR
A new family of Task-Scaling mechanisms that learn task-specific non-contextualised information to scale the original attention weights are proposed, demonstrating that TaSc consistently provides more faithful attention-based explanations compared to three widely-used interpretability techniques.
Is Sparse Attention more Interpretable?
TLDR
It is observed in this setting that inducing sparsity may make it less plausible that attention can be used as a tool for understanding model behavior.
Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models
TLDR
The visualization, Attention Flows, is designed to support users in querying, tracing, and comparing attention within layers, across layers, and amongst attention heads in Transformer-based language models, and to help users gain insight on how a classification decision is made.
A Song of (Dis)agreement: Evaluating the Evaluation of Explainable Artificial Intelligence in Natural Language Processing
TLDR
It is argued that the community should stop using rank correlation as an evaluation metric for attention-based explanations and instead test various explanation methods and employ a human-in-the-loop process to determine if the explanations align with human intuition for the particular use case at hand.
...
...

References

SHOWING 1-10 OF 39 REFERENCES
Attention is not Explanation
TLDR
This work performs extensive experiments across a variety of NLP tasks to assess the degree to which attention weights provide meaningful “explanations” for predictions, and finds that they largely do not.
Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference
TLDR
This paper proposes to interpret the intermediate layers of NLI models by visualizing the saliency of attention and LSTM gating signals and presents several examples for which their methods are able to reveal interesting insights and identify the critical information contributing to the model decisions.
Effective Approaches to Attention-based Neural Machine Translation
TLDR
A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.
Visualizing and Understanding Neural Models in NLP
TLDR
Four strategies for visualizing compositionality in neural models for NLP, inspired by similar work in computer vision, including LSTM-style gates that measure information flow and gradient back-propagation, are described.
Learning Structured Text Representations
TLDR
A model that can encode a document while automatically inducing rich structural dependencies is proposed that embeds a differentiable non-projective parsing algorithm into a neural model and uses attention mechanisms to incorporate the structural biases.
Rationalizing Neural Predictions
TLDR
The approach combines two modular components, generator and encoder, which are trained to operate well together and specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction.
Pathologies of Neural Models Make Interpretations Difficult
TLDR
This work uses input reduction, which iteratively removes the least important word from the input, to expose pathological behaviors of neural models: the remaining words appear nonsensical to humans and are not the ones determined as important by interpretation methods.
Comparing Automatic and Human Evaluation of Local Explanations for Text Classification
TLDR
A variety of local explanation approaches using automatic measures based on word deletion are evaluated, showing that an evaluation using a crowdsourcing experiment correlates moderately with these automatic measures and that a variety of other factors also impact the human judgements.
Explaining Predictions of Non-Linear Classifiers in NLP
TLDR
This paper applies layer-wise relevance propagation for the first time to natural language processing (NLP) and uses it to explain the predictions of a convolutional neural network trained on a topic categorization task.
Understanding Neural Networks through Representation Erasure
TLDR
This paper proposes a general methodology to analyze and interpret decisions from a neural model by observing the effects on the model of erasing various parts of the representation, such as input word-vector dimensions, intermediate hidden units, or input words.
...
...