Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection

  title={Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection},
  author={Hanjie Chen and Guangtao Zheng and Yangfeng Ji},
Generating explanations for neural networks has become crucial for their applications in real-world with respect to reliability and trustworthiness. In natural language processing, existing methods usually provide important features which are words or phrases selected from an input text as an explanation, but ignore the interactions between them. It poses challenges for humans to interpret an explanation and connect it to model prediction. In this work, we build hierarchical explanations by… Expand
Explaining Neural Network Predictions on Sentence Pairs via Learning Word-Group Masks
The Group Mask (GMASK) method is proposed to implicitly detect word correlations by grouping correlated words from the input text pair together and measure their contribution to the corresponding NLP tasks as a whole. Expand
Local vs. Global interpretations for NLP
  • 2020
Recently, WordsWorth scores have been proposed for calculating feature impor1 tance in the context of traditional deep learning models trained for text classification 2 tasks [Anonymous, 2021]. Here,Expand
Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals
This analysis suggests that pairwise explanation techniques are better suited to RC than token-level attributions, which are often unfaithful in the scenarios the authors consider, and proposes an improvement to an attention-based attribution technique, resulting in explanations which better reveal the model’s behavior. Expand
Self-Attention Attribution: Interpreting Information Interactions Inside Transformer
This paper extracts the most salient dependencies in each layer to construct an attribution graph, which reveals the hierarchical interactions inside Transformer and applies self-attention attribution to identify the important attention heads, while others can be pruned with only marginal performance degradation. Expand
LSTMS Compose — and Learn — Bottom-Up
These synthetic experiments support a specific hypothesis about how hierarchical structures are discovered over the course of training: that LSTM constituent representations are learned bottom-up, relying on effective representations of their shorter children, rather than on learning the longer-range relations independently. Expand
BERT meets Shapley: Extending SHAP Explanations to Transformer-based Classifiers
This paper proposes the TransSHAP method that adapts SHAP to transformer models including BERT-based text classifiers, and advances SHAP visualizations by showing explanations in a sequential manner, assessed by human evaluators as competitive to state-of-the-art solutions. Expand
Fast Hierarchical Games for Image Explanations
This work presents a model-agnostic explanation method for image classification based on a hierarchical extension of Shapley coefficients – h-Shap – that resolves some of the limitations of current approaches and is scalable and can be computed without the need of approximation. Expand
Towards interpreting ML-based automated malware detection models: a survey
A new taxonomy towards malware detection interpretation method based on the taxonomy summarized by previous researches in the common field is provided, and the first to evaluate the state-of-the-art approaches by interpretation method attributes to generate the final score. Expand
Integrated Directional Gradients: Feature Interaction Attribution for Neural NLP Models
In this paper, we introduce Integrated Directional Gradients (IDG), a method for attributing importance scores to groups of features, indicating their relevance to the output of a neural networkExpand
The Logic Traps in Evaluating Post-hoc Interpretations
  • Yiming Ju, Yuanzhe Zhang, Zhao Yang, Zhongtao Jiang, Kang Liu, Jun Zhao
  • Computer Science
  • 2021
Post-hoc interpretation aims to explain a trained model and reveal how the model arrives at a decision. Though research on posthoc interpretations has developed rapidly, one growing pain in thisExpand


Understanding Convolutional Neural Networks for Text Classification
An analysis into the inner workings of Convolutional Neural Networks for processing text shows that filters may capture several different semantic classes of ngrams by using different activation patterns, and that global max-pooling induces behavior which separates important n grams from the rest. Expand
Comparing Automatic and Human Evaluation of Local Explanations for Text Classification
A variety of local explanation approaches using automatic measures based on word deletion are evaluated, showing that an evaluation using a crowdsourcing experiment correlates moderately with these automatic measures and that a variety of other factors also impact the human judgements. Expand
Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference
This paper proposes to interpret the intermediate layers of NLI models by visualizing the saliency of attention and LSTM gating signals and presents several examples for which their methods are able to reveal interesting insights and identify the critical information contributing to the model decisions. Expand
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?
This paper evaluates and compares convolutional neural networks for the task of morphological tagging on three morphologically different languages and shows that these models implicitly discover understandable linguistic rules. Expand
Can I trust you more? Model-Agnostic Hierarchical Explanations
Mahe provides context-dependent explanations by a novel local interpretation algorithm that effectively captures any-order interactions, and obtains context-free explanations through generalizing context- dependent interactions to explain global behaviors. Expand
Understanding Neural Networks through Representation Erasure
This paper proposes a general methodology to analyze and interpret decisions from a neural model by observing the effects on the model of erasing various parts of the representation, such as input word-vector dimensions, intermediate hidden units, or input words. Expand
Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs
The driving force behind the recent success of LSTMs has been their ability to learn complex and non-linear relationships. Consequently, our inability to describe these relationships has led to LSTMsExpand
Hierarchical interpretations for neural network predictions
This work introduces the use of hierarchical interpretations to explain DNN predictions through the proposed method, agglomerative contextual decomposition (ACD), and demonstrates that ACD enables users both to identify the more accurate of two DNNs and to better trust a DNN's outputs. Expand
Enhancing and Combining Sequential and Tree LSTM for Natural Language Inference
This paper presents a new state-of-the-art result, achieving the accuracy of 88.3% on the standard benchmark, the Stanford Natural Language Inference dataset, through an enhanced sequential encoding model, which outperforms the previous best model that employs more complicated network architectures. Expand
Rationalizing Neural Predictions
The approach combines two modular components, generator and encoder, which are trained to operate well together and specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction. Expand