LSTMs Exploit Linguistic Attributes of Data

@inproceedings{Liu2018LSTMsEL,
  title={LSTMs Exploit Linguistic Attributes of Data},
  author={Nelson F. Liu and Omer Levy and Roy Schwartz and Chenhao Tan and Noah A. Smith},
  booktitle={Rep4NLP@ACL},
  year={2018}
}
While recurrent neural networks have found success in a variety of natural language processing applications, they are general models of sequential data. [...] Key Method Furthermore, we show that the LSTM learns to solve the memorization task by explicitly using a subset of its neurons to count timesteps in the input. We hypothesize that the patterns and structure in natural language data enable LSTMs to learn by providing approximate ways of reducing loss, but understanding the effect of different training…Expand
On Evaluating the Generalization of LSTM Models in Formal Languages
Recurrent Neural Networks (RNNs) are theoretically Turing-complete and established themselves as a dominant model for language processing. Yet, there still remains an uncertainty regarding theirExpand
On Evaluating the Generalization of LSTM Models in Formal Languages
TLDR
This paper empirically evaluates the inductive learning capabilities of Long Short-Term Memory networks, a popular extension of simple RNNs, to learn simple formal languages. Expand
Understanding Learning Dynamics Of Language Models with SVCCA
TLDR
This first study on the learning dynamics of neural language models is presented, using a simple and flexible analysis method called Singular Vector Canonical Correlation Analysis (SVCCA), which enables to compare learned representations across time and across models, without the need to evaluate directly on annotated data. Expand
LSTMS Compose — and Learn — Bottom-Up
TLDR
These synthetic experiments support a specific hypothesis about how hierarchical structures are discovered over the course of training: that LSTM constituent representations are learned bottom-up, relying on effective representations of their shorter children, rather than on learning the longer-range relations independently. Expand
How LSTM Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text
TLDR
This work empirically shows that the context update vectors of LSTM are approximately quantized to binary or ternary values to help the language model to count the depth of nesting accurately, and shows that natural clusters of the functional words and the part of speeches that trigger phrases are represented in a small but principal subspace of the context-update vector of L STM. Expand
State gradients for analyzing memory in LSTM language models
TLDR
This paper proposes a normalization method that alleviates the influence of variance in embedding space on the state gradients and shows the effectiveness of the method on a synthetic dataset. Expand
Word Interdependence Exposes How LSTMs Compose Representations
TLDR
A novel measure of interdependence between word meanings in an LSTM, based on their interactions at the internal gates is presented, which supports the hypothesis that parent constituents rely on effective representations of their children, rather than on learning long-range relations independently. Expand
Analysis Methods in Neural Language Processing: A Survey
TLDR
Analysis methods in neural language processing are reviewed, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work. Expand
Recoding latent sentence representations - Dynamic gradient-based activation modification in RNNs
TLDR
This thesis presents a generalized framework based on this idea to dynamically adapt hidden activations based on local error signals (recoding) and implement it in form of a novel mechanism, which is only able to produce minor improvements over the baseline due to challenges in its practical application and the efficacy of the tested model variants. Expand
Language Models Learn POS First
TLDR
It is demonstrated that different aspects of linguistic structure are learned at different rates, with part of speech tagging acquired early and global topic information learned continuously. Expand
...
1
2
...

References

SHOWING 1-10 OF 25 REFERENCES
Visualizing and Understanding Recurrent Networks
TLDR
This work uses character-level language models as an interpretable testbed to provide an analysis of LSTM representations, predictions and error types, and reveals the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets. Expand
Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies
TLDR
It is concluded that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured. Expand
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
TLDR
This work proposes a framework that facilitates better understanding of the encoded representations of sentence vectors and demonstrates the potential contribution of the approach by analyzing different sentence representation mechanisms. Expand
Visualizing and Understanding Neural Models in NLP
TLDR
Four strategies for visualizing compositionality in neural models for NLP, inspired by similar work in computer vision, including LSTM-style gates that measure information flow and gradient back-propagation, are described. Expand
Colorless green recurrent networks dream hierarchically
TLDR
Support is brought to the hypothesis that RNNs are not just shallow-pattern extractors, but they also acquire deeper grammatical competence by making reliable predictions about long-distance agreement and do not lag much behind human performance. Expand
Recurrent neural network based language model
TLDR
Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model. Expand
On the Practical Computational Power of Finite Precision RNNs for Language Recognition
TLDR
It is shown that the LSTM and the Elman-RNN with ReLU activation are strictly stronger than the RNN with a squashing activation and the GRU. Expand
Extensions of recurrent neural network language model
TLDR
Several modifications of the original recurrent neural network language model are presented, showing approaches that lead to more than 15 times speedup for both training and testing phases and possibilities how to reduce the amount of parameters in the model. Expand
What do Neural Machine Translation Models Learn about Morphology?
TLDR
This work analyzes the representations learned by neural MT models at various levels of granularity and empirically evaluates the quality of the representations for learning morphology through extrinsic part-of-speech and morphological tagging tasks. Expand
Finding Structure in Time
TLDR
A proposal along these lines first described by Jordan (1986) which involves the use of recurrent links in order to provide networks with a dynamic memory and suggests a method for representing lexical categories and the type/token distinction is developed. Expand
...
1
2
3
...