Corpus ID: 51766417

Memory Architectures in Recurrent Neural Network Language Models

@inproceedings{Yogatama2018MemoryAI,
  title={Memory Architectures in Recurrent Neural Network Language Models},
  author={Dani Yogatama and Yishu Miao and G{\'a}bor Melis and Wang Ling and Adhiguna Kuncoro and Chris Dyer and Phil Blunsom},
  booktitle={ICLR},
  year={2018}
}
We compare and analyze sequential, random access, and stack memory architectures for recurrent neural network language models. [...] Key Method We also propose a generalization to existing continuous stack models (Joulin & Mikolov,2015; Grefenstette et al., 2015) to allow a variable number of pop operations more naturally that further improves performance. We further evaluate these language models in terms of their ability to capture non-local syntactic dependencies on a subject-verb agreement dataset (Linzen…Expand
Ordered Memory
TLDR
A new attention-based mechanism is introduced and use its cumulative probability to control the writing and erasing operation of the memory and a new Gated Recursive Cell is introduced to compose lower-level representations into higher-level representation. Expand
Sequential neural networks as automata
In recent years, neural network architectures for sequence modeling have been applied with great success to a variety of NLP tasks. What neural networks provide in performance, however, they lack inExpand
A Taxonomy for Neural Memory Networks
  • Ying Ma, J. Príncipe
  • Computer Science, Mathematics
  • IEEE Transactions on Neural Networks and Learning Systems
  • 2020
TLDR
This paper creates a framework for memory organization and then compares four popular dynamic models: vanilla recurrent neural network, long short-term memory, neural stack, and neural RAM to open the dynamic neural networks’ black box from the memory usage prospective. Expand
Recognizing Long Grammatical Sequences Using Recurrent Networks Augmented With An External Differentiable Stack
TLDR
This paper improves the memory-augmented RNN with important architectural and state updating mechanisms that ensure that the model learns to properly balance the use of its latent states with external memory, and exhibits better generalization performance. Expand
Scalable Syntax-Aware Language Models Using Knowledge Distillation
TLDR
An efficient knowledge distillation (KD) technique is introduced that transfers knowledge from a syntactic language model trained on a small corpus to an LSTM language model, hence enabling the L STM to develop a more structurally sensitive representation of the larger training data it learns from. Expand
Modeling Hierarchical Structures with Continuous Recursive Neural Networks
TLDR
This work proposes Continuous Recursive Neural Network (CRvNN) as a backpropagation-friendly alternative to address the aforementioned limitations of traditional RvNNs by incorporating a continuous relaxation to the induced structure. Expand
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
TLDR
The novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference. Expand
Neural Attentions for Natural Language Understanding and Modeling by Hongyin Luo
In this thesis, we explore the use of neural attention mechanisms for improving natural language representation learning, a fundamental concept for modern natural language processing. With theExpand
Can Recurrent Neural Networks Learn Nested Recursion?
TLDR
This paper investigates experimentally the capability of several recurrent neural networks (RNNs) to learn nested recursion, and measures an upper bound of their capability to do so by simplifying the task to learning a generalized Dyck language, namely one composed of matching parentheses of various kinds. Expand
A Neural State Pushdown Automata
TLDR
A “neural state” pushdown automaton (NSPDA), which consists of a discrete stack instead of an continuous one and is coupled to a neural network state machine, and empirically shows its effectiveness in recognizing various context-free grammars (CFGs). Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 19 REFERENCES
Recurrent Memory Networks for Language Modeling
TLDR
Recurrent Memory Network (RMN) is proposed, a novel RNN architecture that not only amplifies the power of RNN but also facilitates the understanding of its internal functioning and allows us to discover underlying patterns in data. Expand
Regularizing and Optimizing LSTM Language Models
TLDR
This paper proposes the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization and introduces NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user. Expand
On the State of the Art of Evaluation in Neural Language Models
TLDR
This work reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrives at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models. Expand
Frustratingly Short Attention Spans in Neural Language Modeling
TLDR
This paper proposes a neural language model with a key-value attention mechanism that outputs separate representations for the key and value of a differentiable memory, as well as for encoding the next-word distribution that outperforms existing memory-augmented neural language models on two corpora. Expand
Pointer Sentinel Mixture Models
TLDR
The pointer sentinel-LSTM model achieves state of the art language modeling performance on the Penn Treebank while using far fewer parameters than a standard softmax LSTM and the freely available WikiText corpus is introduced. Expand
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
TLDR
This work introduces a novel theoretical framework that facilitates better learning in language modeling, and shows that this framework leads to tying together the input embedding and the output projection matrices, greatly reducing the number of trainable variables. Expand
Neural Architecture Search with Reinforcement Learning
TLDR
This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set. Expand
Variable Computation in Recurrent Neural Networks
TLDR
A modification to existing recurrent units is explored which allows them to learn to vary the amount of computation they perform at each step, without prior knowledge of the sequence's time structure, which leads to better performance overall on evaluation tasks. Expand
Improving Neural Language Models with a Continuous Cache
TLDR
A simplified version of memory augmented networks, which stores past hidden activations as memory and accesses them through a dot product with the current hidden activation, which is very efficient and scales to very large memory sizes. Expand
Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies
TLDR
It is concluded that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured. Expand
...
1
2
...