Attention Meets Perturbations: Robust and Interpretable Attention With Adversarial Training

  title={Attention Meets Perturbations: Robust and Interpretable Attention With Adversarial Training},
  author={Shunsuke Kitada and Hitoshi Iyatomi},
  journal={IEEE Access},
Although attention mechanisms have been applied to a variety of deep learning models and have been shown to improve the prediction performance, it has been reported to be vulnerable to perturbations to the mechanism. To overcome the vulnerability to perturbations in the mechanism, we are inspired by adversarial training (AT), which is a powerful regularization technique for enhancing the robustness of the models. In this paper, we propose a general training technique for natural language… Expand

Figures and Tables from this paper

Making Attention Mechanisms More Robust and Interpretable with Virtual Adversarial Training for Semi-Supervised Text Classification
A new general training technique for attention mechanisms based on virtual adversarial training (VAT), which provides significantly better prediction performance and demonstrates a stronger correlation with the word importance and better agreement with evidence provided by humans. Expand
Improved Text Classification via Contrastive Adversarial Training
A simple and general method to regularize the fine-tuning of Transformer-based encoders for text classification tasks by generating adversarial examples by perturbing the word embeddings of the model and performing contrastive learning on clean and adversarialExamples in order to teach the model to learn noise-invariant representations. Expand


Robust Multilingual Part-of-Speech Tagging via Adversarial Training
It is found that AT not only improves the overall tagging accuracy, but also prevents over-fitting well in low resource languages and boosts tagging accuracy for rare / unseen words. Expand
Interpretable Adversarial Perturbation in Input Embedding Space for Text
This paper restores interpretability to adversarial training methods by restricting the directions of perturbations toward the existing words in the input embedding space and can straightforwardly reconstruct each input with perturbATIONS to an actual text by considering the perturbation to be the replacement of words in a sentence while maintaining or even improving the task performance. Expand
Explaining and Harnessing Adversarial Examples
It is argued that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature, supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Expand
Attention is All you Need
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. Expand
Adversarial Training Methods for Semi-Supervised Text Classification
This work extends adversarial and virtual adversarial training to the text domain by applying perturbations to the word embeddings in a recurrent neural network rather than to the original input itself. Expand
A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Samples
This paper investigates the topological relationship between two (pseudo)metric spaces corresponding to predictor and oracle and develops necessary and sufficient conditions that can determine if a classifier is always robust (strong-robust) against adversarial examples according to f_2. Expand
Understanding adversarial training: Increasing local stability of supervised models through robust optimization
The proposed framework generalizes adversarial training, as well as previous approaches for increasing local stability of ANNs, and increases the robustness of the network to existing adversarial examples, while making it harder to generate new ones. Expand
Delving into Transferable Adversarial Examples and Black-box Attacks
This work is the first to conduct an extensive study of the transferability over large models and a large scale dataset, and it is also theFirst to study the transferabilities of targeted adversarial examples with their target labels. Expand
Attention is not Explanation
This work performs extensive experiments across a variety of NLP tasks to assess the degree to which attention weights provide meaningful “explanations” for predictions, and finds that they largely do not. Expand
Effective Approaches to Attention-based Neural Machine Translation
A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions. Expand