LSTMs Exploit Linguistic Attributes of Data

@inproceedings{Liu2018LSTMsEL,
  title={LSTMs Exploit Linguistic Attributes of Data},
  author={Nelson F. Liu and Omer Levy and Roy Schwartz and Chenhao Tan and Noah A. Smith},
  booktitle={Rep4NLP@ACL},
  year={2018}
}
While recurrent neural networks have found success in a variety of natural language processing applications, they are general models of sequential data. [] Key Method Furthermore, we show that the LSTM learns to solve the memorization task by explicitly using a subset of its neurons to count timesteps in the input. We hypothesize that the patterns and structure in natural language data enable LSTMs to learn by providing approximate ways of reducing loss, but understanding the effect of different training…

Figures from this paper

On Evaluating the Generalization of LSTM Models in Formal Languages
TLDR
This paper empirically evaluates the inductive learning capabilities of Long Short-Term Memory networks, a popular extension of simple RNNs, to learn simple formal languages, in particular ab, abc, and abcd.
On Evaluating the Generalization of LSTM Models in Formal Languages
TLDR
This paper empirically evaluates the inductive learning capabilities of Long Short-Term Memory networks, a popular extension of simple RNNs, to learn simple formal languages.
Understanding Learning Dynamics Of Language Models with SVCCA
TLDR
This first study on the learning dynamics of neural language models is presented, using a simple and flexible analysis method called Singular Vector Canonical Correlation Analysis (SVCCA), which enables to compare learned representations across time and across models, without the need to evaluate directly on annotated data.
Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models
TLDR
The experimental results show that pretraining with an artificial language with a nesting dependency structure provides some knowledge transferable to natural language, and a follow-up probing analysis indicates that its success in the transfer is related to the amount of encoded contextual information.
LSTMS Compose — and Learn — Bottom-Up
TLDR
These synthetic experiments support a specific hypothesis about how hierarchical structures are discovered over the course of training: that LSTM constituent representations are learned bottom-up, relying on effective representations of their shorter children, rather than on learning the longer-range relations independently.
How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text
TLDR
This work empirically shows that the context update vectors of LSTM are approximately quantized to binary or ternary values to help the language model to count the depth of nesting accurately, and shows that natural clusters of the functional words and the part of speeches that trigger phrases are represented in a small but principal subspace of the context-update vector of L STM.
Word Interdependence Exposes How LSTMs Compose Representations
TLDR
A novel measure of interdependence between word meanings in an LSTM, based on their interactions at the internal gates is presented, which supports the hypothesis that parent constituents rely on effective representations of their children, rather than on learning long-range relations independently.
Analysis Methods in Neural Language Processing: A Survey
TLDR
Analysis methods in neural language processing are reviewed, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.
Recoding latent sentence representations - Dynamic gradient-based activation modification in RNNs
TLDR
This thesis presents a generalized framework based on this idea to dynamically adapt hidden activations based on local error signals (recoding) and implement it in form of a novel mechanism, which is only able to produce minor improvements over the baseline due to challenges in its practical application and the efficacy of the tested model variants.
...
...

References

SHOWING 1-10 OF 25 REFERENCES
Visualizing and Understanding Recurrent Networks
TLDR
This work uses character-level language models as an interpretable testbed to provide an analysis of LSTM representations, predictions and error types, and reveals the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets.
Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies
TLDR
It is concluded that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
TLDR
This work proposes a framework that facilitates better understanding of the encoded representations of sentence vectors and demonstrates the potential contribution of the approach by analyzing different sentence representation mechanisms.
Visualizing and Understanding Neural Models in NLP
TLDR
Four strategies for visualizing compositionality in neural models for NLP, inspired by similar work in computer vision, including LSTM-style gates that measure information flow and gradient back-propagation, are described.
Colorless green recurrent networks dream hierarchically
TLDR
The authors' language-model-trained RNNs make reliable predictions about long-distance agreement, and do not lag much behind human performance, bringing support to the hypothesis that RNN's are not just shallow-pattern extractors, but they also acquire deeper grammatical competence.
Recurrent neural network based language model
TLDR
Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model.
On the Practical Computational Power of Finite Precision RNNs for Language Recognition
TLDR
It is shown that the LSTM and the Elman-RNN with ReLU activation are strictly stronger than the RNN with a squashing activation and the GRU.
Extensions of recurrent neural network language model
TLDR
Several modifications of the original recurrent neural network language model are presented, showing approaches that lead to more than 15 times speedup for both training and testing phases and possibilities how to reduce the amount of parameters in the model.
What do Neural Machine Translation Models Learn about Morphology?
TLDR
This work analyzes the representations learned by neural MT models at various levels of granularity and empirically evaluates the quality of the representations for learning morphology through extrinsic part-of-speech and morphological tagging tasks.
Finding Structure in Time
TLDR
A proposal along these lines first described by Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory and suggests a method for representing lexical categories and the type/token distinction is developed.
...
...