Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

@article{Linzen2016AssessingTA,
  title={Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies},
  author={Tal Linzen and Emmanuel Dupoux and Yoav Goldberg},
  journal={Transactions of the Association for Computational Linguistics},
  year={2016},
  volume={4},
  pages={521-535}
}
The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities. [...] Key Method We probe the architecture’s grammatical competence both using training objectives with an explicit grammatical target (number prediction, grammaticality judgments) and using language models. In the strongly supervised settings, the LSTM achieved very high overall accuracy (less than 1% errors), but errors increased when…Expand
LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better
TLDR
It is found that the mere presence of syntactic information does not improve accuracy, but when model architecture is determined by syntax, number agreement is improved: top-down construction outperforms left-corner and bottom-up variants in capturing non-local structural dependencies. Expand
How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?
TLDR
A new architecture is proposed, the Decay RNN, which incorporates the decaying nature of neuronal activations and models the excitatory and inhibitory connections in a population of neurons and shows competitive performance relative to LSTMs on subject-verb agreement, sentence grammaticality, and language modeling tasks. Expand
Structural Supervision Improves Learning of Non-Local Grammatical Dependencies
TLDR
It is found that the RNNG outperforms the LSTM on both types of grammatical dependencies and even learns many of the Island Constraints on the filler-gap dependency, which provides data efficiency advantages over purely string-based training of neural language models in acquiring human-like generalizations about non-local grammatical Dependencies. Expand
Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM Language Models
TLDR
A causal account of structural properties as carried by paths across gates and neurons of a recurrent neural network is introduced, offering a finer and a more complete view of an LSTM’s handling of this structural aspect of the English language than prior results based on diagnostic classifiers and ablation. Expand
Evaluating the Ability of LSTMs to Learn Context-Free Grammars
TLDR
It is concluded that LSTMs do not learn the relevant underlying context-free rules, suggesting the good overall performance is attained rather by an efficient way of evaluating nuisance variables. Expand
Exploring the Syntactic Abilities of RNNs with Multi-task Learning
TLDR
It is shown that easily available agreement training data can improve performance on other syntactic tasks, in particular when only a limited amount of training data is available for those tasks, and the multi-task paradigm can be leveraged to inject grammatical knowledge into language models. Expand
Attribution Analysis of Grammatical Dependencies in LSTMs
TLDR
Using layer-wise relevance propagation, it is shown that LSTM performance on number agreement is directly correlated with the model's ability to distinguish subjects from other nouns, suggesting that L STM language models are able to infer robust representations of syntactic dependencies. Expand
Can LSTM Learn to Capture Agreement? The Case of Basque
TLDR
It is found that sequential models perform worse on agreement prediction in Basque than one might expect on the basis of a previous agreement prediction work in English. Expand
Scalable Syntax-Aware Language Models Using Knowledge Distillation
TLDR
An efficient knowledge distillation (KD) technique is introduced that transfers knowledge from a syntactic language model trained on a small corpus to an LSTM language model, hence enabling the L STM to develop a more structurally sensitive representation of the larger training data it learns from. Expand
Controlled Evaluation of Grammatical Knowledge in Mandarin Chinese Language Models
Prior work has shown that structural supervision helps English language models learn generalizations about syntactic phenomena such as subject-verb agreement. However, it remains unclear if such anExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 55 REFERENCES
Visualizing and Understanding Neural Models in NLP
TLDR
Four strategies for visualizing compositionality in neural models for NLP, inspired by similar work in computer vision, including LSTM-style gates that measure information flow and gradient back-propagation, are described. Expand
Language acquisition in the absence of explicit negative evidence: how important is starting small?
It is commonly assumed that innate linguistic constraints are necessary to learn a natural language, based on the apparent lack of explicit negative evidence provided to children and on Gold's proofExpand
Visualizing and Understanding Recurrent Networks
TLDR
This work uses character-level language models as an interpretable testbed to provide an analysis of LSTM representations, predictions and error types, and reveals the existence of interpretable cells that keep track of long-range dependencies such as line lengths, quotes and brackets. Expand
The Acquisition of Anaphora by Simple Recurrent Networks
TLDR
This article applies Simple Recurrent Networks to the task of assigning an interpretation to reflexive and pronominal anaphora, a task that demands more refined sensitivity to syntactic structure than has been previously explored. Expand
Statistical Representation of Grammaticality Judgements: the Limits of N-Gram Models
TLDR
This work uses a set of enriched n-gram models to track grammaticality judgements for different sorts of passive sentences in English and indicates some of the strengths and the limitations of word and lexical class n- gram models as candidate representations of speakers’ grammatical knowledge. Expand
One billion word benchmark for measuring progress in statistical language modeling
TLDR
A new benchmark corpus to be used for measuring progress in statistical language modeling, with almost one billion words of training data, is proposed, which is useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques. Expand
Representation of Linguistic Form and Function in Recurrent Neural Networks
TLDR
A method for estimating the amount of contribution of individual tokens in the input to the final prediction of the networks is proposed and shows that the Visual pathway pays selective attention to lexical categories and grammatical functions that carry semantic information, and learns to treat word types differently depending on their grammatical function and their position in the sequential structure of the sentence. Expand
Grammar as a Foreign Language
TLDR
The domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers. Expand
LSTM recurrent networks learn simple context-free and context-sensitive languages
TLDR
Long short-term memory (LSTM) variants are also the first RNNs to learn a simple context-sensitive language, namely a(n)b( n)c(n). Expand
LSTM Neural Networks for Language Modeling
TLDR
This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system. Expand
...
1
2
3
4
5
...