Explaining Predictions of Non-Linear Classifiers in NLP

  title={Explaining Predictions of Non-Linear Classifiers in NLP},
  author={Leila Arras and F. Horn and Gr{\'e}goire Montavon and Klaus-Robert M{\"u}ller and Wojciech Samek},
Layer-wise relevance propagation (LRP) is a recently proposed technique for explaining predictions of complex non-linear classifiers in terms of input variables. In this paper, we apply LRP for the first time to natural language processing (NLP). More precisely, we use it to explain the predictions of a convolutional neural network (CNN) trained on a topic categorization task. Our analysis highlights which words are relevant for a specific prediction of the CNN. We compare our technique to… 

Figures from this paper

Looking Deeper into Deep Learning Model: Attribution-based Explanations of TextCNN

This paper presents a feature-based evaluation framework for comparing the two attribution methods on customer reviews and Customer Due Diligence extracted reports, and investigates perturbations based on embedded features removal from intermediate layers of Convolutional Neural Networks.

Evaluating neural network explanation methods using hybrid documents and morphological prediction

LIMSSE is introduced, a substring-based extension of LIME that produces the most successful explanations in the hybrid document experiment and requires no manual annotations, and generates a relevance ground truth automatically.

Explaining Recurrent Neural Network Predictions in Sentiment Analysis

This work applies a specific propagation rule applicable to multiplicative connections as they arise in recurrent network architectures such as LSTMs and GRUs to a word-based bi-directional LSTM model on a five-class sentiment prediction task and evaluates the resulting LRP relevances both qualitatively and quantitatively.

Evaluating Recurrent Neural Network Explanations

Using the method that performed best in the authors' experiments, it is shown how specific linguistic phenomena such as the negation in sentiment analysis reflect in terms of relevance patterns, and how the relevance visualization can help to understand the misclassification of individual samples.

Evaluating neural network explanation methods using hybrid documents and morphological agreement

This work conducts the first comprehensive evaluation of explanation methods for NLP and shows empirically that LIMSSE, LRP and DeepLIFT are the most effective explanation methods and recommends them for explaining DNNs in NLP.

Self-Explaining Structures Improve NLP Models

It is shown for the first time show that interpretability does not come at the cost of performance: a neural model of self-explaining features obtains better performances than its counterpart without the self- Explaining nature, achieving a new SOTA performance of 59.1 on SST-5 and anew Sota performance of 92.3 on SNLI.

Interpreting Deep Learning Models in Natural Language Processing: A Review

In this survey, a comprehensive review of various interpretation methods for neural models in NLP is provided, including a high-level taxonomy for interpretation methods in N LP, and points out deficiencies of current methods and suggest some avenues for future research.

Human-grounded Evaluations of Explanation Methods for Text Classification

This paper considers several model-agnostic and model-specific explanation methods for CNNs for text classification and conducts three human-grounded evaluations, focusing on different purposes of explanations: revealing model behavior, justifying model predictions, and helping humans investigate uncertain predictions.

Variable Instance-Level Explainability for Text Classification

Evaluation on four standard text classification datasets shows that the method for extracting variable-length explanations using a set of different feature scoring methods at instance-level consistently provides more faithful explanations compared to previous fixed-length and fixed-feature scoring methods for rationale extraction.

Opening the machine learning black box with Layer-wise Relevance Propagation

This thesis describes a novel method for explaining non-linear classifier decisions by decomposing the prediction function, called Layer-wise Relevance Propagation (LRP), and applies this method to Neural Networks, kernelized Support Vector Machines and Bag of Words feature extraction pipelines.



Visualizing and Understanding Neural Models in NLP

Four strategies for visualizing compositionality in neural models for NLP, inspired by similar work in computer vision, including LSTM-style gates that measure information flow and gradient back-propagation, are described.

On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation

This work proposes a general solution to the problem of understanding classification decisions by pixel-wise decomposition of nonlinear classifiers by introducing a methodology that allows to visualize the contributions of single pixels to predictions for kernel-based classifiers over Bag of Words features and for multilayered neural networks.

Analyzing Classifiers: Fisher Vectors and Deep Neural Networks

This paper extends the LRP framework for Layer-wise Relevance Propagation for Fisher vector classifiers and uses it as analysis tool to quantify the importance of context for classification, qualitatively compare DNNs against FV classifiers in terms of important image regions and detect potential flaws and biases in data.

Visualizing and Understanding Convolutional Networks

A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.

Natural Language Processing (Almost) from Scratch

We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity

Interpreting individual classifications of hierarchical networks

A new method is proposed, contribution propagation, that gives per-instance explanations of a trained network's classifications, and the resulting explanations are used to reveal unexpected behavior of networks that achieve high accuracy on visual object-recognition tasks using well-known data sets.

Evaluating the Visualization of What a Deep Neural Network Has Learned

A general methodology based on region perturbation for evaluating ordered collections of pixels such as heatmaps and shows that the recently proposed layer-wise relevance propagation algorithm qualitatively and quantitatively provides a better explanation of what made a DNN arrive at a particular classification decision than the sensitivity-based approach or the deconvolution method.

Understanding Representations Learned in Deep Architectures

It is shown that consistent filter-like interpretation is possible and simple to accomp lish at the unit level and it is hoped that such techniques will allow researchers in deep architectures to unde rstand more of how and why deep architectures work.

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

This paper addresses the visualisation of image classification models, learnt using deep Convolutional Networks (ConvNets), and establishes the connection between the gradient-based ConvNet visualisation methods and deconvolutional networks.