• Corpus ID: 1399322

End-To-End Memory Networks

  title={End-To-End Memory Networks},
  author={Sainbayar Sukhbaatar and Arthur D. Szlam and Jason Weston and Rob Fergus},
We introduce a neural network with a recurrent attention model over a possibly large external memory. [] Key Method The flexibility of the model allows us to apply it to tasks as diverse as (synthetic) question answering and to language modeling. For the former our approach is competitive with Memory Networks, but with less supervision. For the latter, on the Penn TreeBank and Text8 datasets our approach demonstrates comparable performance to RNNs and LSTMs. In both cases we show that the key concept of…

Figures and Tables from this paper

Recurrent Memory Networks for Language Modeling

Recurrent Memory Network (RMN) is proposed, a novel RNN architecture that not only amplifies the power of RNN but also facilitates the understanding of its internal functioning and allows us to discover underlying patterns in data.

An Improved End-To-End Memory Network for QA Tasks

A novel gated linear units (GLU) and local-attention based end-to-end memory networks (MemN2N-GL) motivated by the success of attention mechanism theory in the field of neural machine translation shows an improved possibility to develop the ability of capturing complex memory-query relations and works better on some subtasks.

Recurrent Memory Networks for Language Modeling

This paper proposes Recurrent Memory Network (RMN), a novel RNN architecture that not only enhances the power of RNN but also facilitates the understanding of its internal functioning and allows us to discover underlying patterns in data.

Guided Sequence-to-Sequence Learning with External Rule Memory

This paper proposes to use the memory for storing part of the instructions, and more specifically, the transformation rules in sequence-tosequence learning tasks, in an external memory attached to a neural system.


The EntNet is equipped with a dynamic long-term memory which allows it to maintain and update a representation of the state of the world as it receives new data, and is the first method to solve all the tasks in the 10k training examples setting.

Improving End-to-End Memory Networks with Unified Weight Tying

This work proposes a unified model generalising weight tying and achieves uniformly high performance, improving on the best results for memory network-based models on the b AbI dataset, and competitive results on Dialog bAbI.

Match memory recurrent networks

This paper proposes a novel attention method based on a function between neuron activities, which they term a “match function”, which is augmented by a recursive softmax function and shows that it has stronger performance when only one memory hop is used in both terms of average score and in terms of solved questions.

Gated End-to-End Memory Networks

A novel end-to-end memory access regulation mechanism inspired by the current progress on the connection short-cutting principle in the field of computer vision is introduced, which is the first of its kind in the world.

End-to-End Memory-Enhanced Neural Architectures for Automatic Speech Recognition

Though the ultimate memory models do not quite surpass the performance of the pure attention-based models, they are quite promising and perhaps indicate that with proper fine-tuning these networks could obtain superlative performance on ASR tasks.

Progressive Memory Banks for Incremental Domain Adaptation

This paper addresses the problem of incremental domain adaptation (IDA) in natural language processing (NLP) by adopting the recurrent neural network widely used in NLP, but augmenting it with a directly parameterized memory bank, which is retrieved by an attention mechanism at each step of RNN transition.



LSTM Neural Networks for Language Modeling

This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.

Memory Networks

This work describes a new class of learning models called memory networks, which reason with inference components combined with a long-term memory component; they learn how to use these jointly.

A Clockwork RNN

This paper introduces a simple, yet powerful modification to the simple RNN architecture, the Clockwork RNN (CW-RNN), in which the hidden layer is partitioned into separate modules, each processing inputs at its own temporal granularity, making computations only at its prescribed clock rate.

Long Short-Term Memory

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

These advanced recurrent units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU), are found to be comparable to LSTM.

Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets

The limitations of standard deep learning approaches are discussed and it is shown that some of these limitations can be overcome by learning how to grow the complexity of a model in a structured way.

Neural Machine Translation by Jointly Learning to Align and Translate

It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

Recurrent Neural Network Regularization

This paper shows how to correctly apply dropout to LSTMs, and shows that it substantially reduces overfitting on a variety of tasks.

Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory

An analog stack is developed which reverts to a discrete stack by quantization of all activations, after the network has learned the transition rules and stack actions, and an enhancement of the network's learning capabilities by providing hints.

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks

This work argues for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering, and classify these tasks into skill sets so that researchers can identify (and then rectify) the failings of their systems.