Colorless Green Recurrent Networks Dream Hierarchically

  title={Colorless Green Recurrent Networks Dream Hierarchically},
  author={Kristina Gulordava and Piotr Bojanowski and Edouard Grave and Tal Linzen and Marco Baroni},
  booktitle={North American Chapter of the Association for Computational Linguistics},
Recurrent neural networks (RNNs) achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language. We investigate to what extent RNNs learn to track abstract hierarchical syntactic structure. We test whether RNNs trained with a generic language modeling objective in four languages (Italian, English, Hebrew, Russian) can predict long-distance number agreement in various constructions. We include in our evaluation… 

Figures and Tables from this paper

Priorless Recurrent Networks Learn Curiously

It is shown that domain-general recurrent neural networks will also learn number agreement within unnatural sentence structures, i.e. structures that are not found within any natural languages and which humans struggle to process.

RNNs as psycholinguistic subjects: Syntactic state and grammatical dependency

It is demonstrated that these models represent and maintain incremental syntactic state, but that they do not always generalize in the same way as humans.

Deep RNNs Encode Soft Hierarchical Syntax

A set of experiments is presented to demonstrate that deep recurrent neural networks learn internal representations that capture soft hierarchical notions of syntax from highly varied supervision, indicating that a soft syntactic hierarchy emerges.

Can RNNs learn Recursive Nested Subject-Verb Agreements?

A new framework to study recursive processing in RNNs is presented, using subject-verb agreement as a probe into the representations of the neural network, which indicates how neural networks may extract bounded nested tree structures, without learning a systematic recursive rule.

How much complexity does an RNN architecture need to learn syntax-sensitive dependencies?

A new architecture is proposed, the Decay RNN, which incorporates the decaying nature of neuronal activations and models the excitatory and inhibitory connections in a population of neurons and shows competitive performance relative to LSTMs on subject-verb agreement, sentence grammaticality, and language modeling tasks.

What Syntactic Structures block Dependencies in RNN Language Models?

This paper demonstrates that two state-of-the-art RNN models are able to maintain the filler--gap dependency through unbounded sentential embeddings and are also sensitive to the hierarchical relationship between the filler and the gap, known as syntactic islands.

Do RNNs learn human-like abstract word order preferences?

The results show that RNNs learn the abstract features of weight, animacy, and definiteness which underlie soft constraints on syntactic alternations.

Hierarchy or Heuristic ? Examining hierarchical structure and the poverty of the stimulus in recurrent neural networks

It is shown that it is possible for a model to perform well on recognizing long-range dependencies and yet fail to exhibit more global hierarchical awareness, and a more thorough criteria for defining hierarchical structural awareness is proposed.



Using Deep Neural Networks to Learn Syntactic Agreement

DNNs require large vocabularies to form substantive lexical embeddings in order to learn structural patterns, and this finding has interesting consequences for the understanding of the way in which DNNs represent syntactic information.

Exploring the Syntactic Abilities of RNNs with Multi-task Learning

It is shown that easily available agreement training data can improve performance on other syntactic tasks, in particular when only a limited amount of training data is available for those tasks, and the multi-task paradigm can be leveraged to inject grammatical knowledge into language models.

Memory Architectures in Recurrent Neural Network Language Models

The results demonstrate the value of stack-structured memory for explaining the distribution of words in natural language, in line with linguistic theories claiming a context-free backbone for natural language.

Simple Recurrent Networks and Natural Language: How Important is Starting Small?

Evidence is reported that starting with simplified inputs is not necessary in training recurrent networks to learn pseudo-natural languages and it is suggested that the structure of natural language can be learned without special teaching methods or limited cognitive resources.

Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

It is concluded that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.

Distributed representations, simple recurrent networks, and grammatical structure

AbstractIn this paper three problems for a connectionist account of language are considered1.What is the nature of linguistic representations?2.How can complex structural relationships such as

Toward a connectionist model of recursion in human linguistic performance

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

This work proposes a framework that facilitates better understanding of the encoded representations of sentence vectors and demonstrates the potential contribution of the approach by analyzing different sentence representation mechanisms.

Linguistic Regularities in Continuous Space Word Representations

The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset.

What do Neural Machine Translation Models Learn about Morphology?

This work analyzes the representations learned by neural MT models at various levels of granularity and empirically evaluates the quality of the representations for learning morphology through extrinsic part-of-speech and morphological tagging tasks.