Corpus ID: 25717172

Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs

@article{Murdoch2018BeyondWI,
  title={Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs},
  author={W. James Murdoch and Peter J. Liu and Bin Yu},
  journal={ArXiv},
  year={2018},
  volume={abs/1801.05453}
}
The driving force behind the recent success of LSTMs has been their ability to learn complex and non-linear relationships. Consequently, our inability to describe these relationships has led to LSTMs being characterized as black boxes. To this end, we introduce contextual decomposition (CD), an interpretation algorithm for analysing individual predictions made by standard LSTMs, without any changes to the underlying model. By decomposing the output of a LSTM, CD captures the contributions of… Expand
Evaluating Recurrent Neural Network Explanations
TLDR
Using the method that performed best in the authors' experiments, it is shown how specific linguistic phenomena such as the negation in sentiment analysis reflect in terms of relevance patterns, and how the relevance visualization can help to understand the misclassification of individual samples. Expand
On Attribution of Recurrent Neural Network Predictions via Additive Decomposition
TLDR
Comprehensive analysis shows that the proposed novel attribution method, called REAT, could unveil the useful linguistic knowledge captured by RNNs, and could be utilized as a debugging tool to examine the vulnerability and failure reasons of RNN's. Expand
How recurrent networks implement contextual processing in sentiment analysis
TLDR
This work proposes general methods for reverse engineering recurrent neural networks (RNNs) to identify and elucidate contextual processing, and applies these methods to understand RNNs trained on sentiment classification. Expand
Attribution Analysis of Grammatical Dependencies in LSTMs
TLDR
Using layer-wise relevance propagation, it is shown that LSTM performance on number agreement is directly correlated with the model's ability to distinguish subjects from other nouns, suggesting that L STM language models are able to infer robust representations of syntactic dependencies. Expand
TOWARDS HIERARCHICAL IMPORTANCE ATTRIBU-
The impressive performance of neural networks on natural language processing tasks attributes to their ability to model complicated word and phrase compositions. To explain how the model handlesExpand
Dissecting Contextual Word Embeddings: Architecture and Representation
TLDR
There is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks, suggesting that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated. Expand
Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM Language Models
TLDR
A causal account of structural properties as carried by paths across gates and neurons of a recurrent neural network is introduced, offering a finer and a more complete view of an LSTM’s handling of this structural aspect of the English language than prior results based on diagnostic classifiers and ablation. Expand
Hierarchical interpretations for neural network predictions
TLDR
This work introduces the use of hierarchical interpretations to explain DNN predictions through the proposed method, agglomerative contextual decomposition (ACD), and demonstrates that ACD enables users both to identify the more accurate of two DNNs and to better trust a DNN's outputs. Expand
NEURAL SEQUENCE MODELS
The impressive performance of neural networks on natural language processing tasks attributes to their ability to model complicated word and phrase compositions. To explain how the model handlesExpand
LSTMS Compose — and Learn — Bottom-Up
TLDR
These synthetic experiments support a specific hypothesis about how hierarchical structures are discovered over the course of training: that LSTM constituent representations are learned bottom-up, relying on effective representations of their shorter children, rather than on learning the longer-range relations independently. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 21 REFERENCES
Automatic Rule Extraction from Long Short Term Memory Networks
TLDR
By identifying consistently important patterns of words, this paper is able to distill state of the art LSTMs on sentiment analysis and question answering into a set of representative phrases and quantitatively validated by using the extracted phrases to construct a simple, rule-based classifier which approximates the output of the LSTM. Expand
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
TLDR
A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network. Expand
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
TLDR
The Tree-LSTM is introduced, a generalization of LSTMs to tree-structured network topologies that outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences and sentiment classification. Expand
Visualizing and Understanding Recurrent Networks
TLDR
This work uses character-level language models as an interpretable testbed to provide an analysis of LSTM representations, predictions and error types, and reveals the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets. Expand
Understanding Neural Networks through Representation Erasure
TLDR
This paper proposes a general methodology to analyze and interpret decisions from a neural model by observing the effects on the model of erasing various parts of the representation, such as input word-vector dimensions, intermediate hidden units, or input words. Expand
Sequence to Sequence Learning with Neural Networks
TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier. Expand
LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks
TLDR
This work presents LSTMVis, a visual analysis tool for recurrent neural networks with a focus on understanding these hidden state dynamics, and describes the domain, the different stakeholders, and their goals and tasks. Expand
GloVe: Global Vectors for Word Representation
TLDR
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure. Expand
On the State of the Art of Evaluation in Neural Language Models
TLDR
This work reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrives at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models. Expand
A Neural Attention Model for Abstractive Sentence Summarization
TLDR
This work proposes a fully data-driven approach to abstractive sentence summarization by utilizing a local attention-based model that generates each word of the summary conditioned on the input sentence. Expand
...
1
2
3
...