• Corpus ID: 1101453

Recurrent Highway Networks

@inproceedings{Zilly2016RecurrentHN,
  title={Recurrent Highway Networks},
  author={Julian G. Zilly and Rupesh Kumar Srivastava and Jan Koutn{\'i}k and J{\"u}rgen Schmidhuber},
  booktitle={International Conference on Machine Learning},
  year={2016}
}
Many sequential processing tasks require complex nonlinear transition functions from one step to the next. However, recurrent neural networks with 'deep' transition functions remain difficult to train, even when using Long Short-Term Memory (LSTM) networks. We introduce a novel theoretical analysis of recurrent networks based on Gersgorin's circle theorem that illuminates several modeling and optimization issues and improves our understanding of the LSTM cell. Based on this analysis we propose… 

Figures and Tables from this paper

Fast-Slow Recurrent Neural Networks

The approach is general as any kind of RNN cell is a possible building block for the FS-RNN architecture, and thus can be flexibly applied to different tasks.

Recurrent Highway Networks With Grouped Auxiliary Memory

This paper proposes a novel RNN architecture based on Recurrent Highway Network with Grouped Auxiliary Memory (GAM-RHN), which interconnects the RHN with a set of auxiliary memory units specifically for storing long-term information via reading and writing operations, analogous to Memory Augmented Neural Networks (MANNs).

Residual Recurrent Highway Networks for Learning Deep Sequence Prediction Models

R2HN is proposed that contains highways within temporal structure of the network for unimpeded information propagation, thus alleviating gradient vanishing problem and posing as residual learning framework to prevent performance degradation problem.

Neural Machine Translation with Recurrent Highway Networks

This paper examines the effectiveness of the simple Recurrent Highway Networks (RHN) in NMT tasks, and investigates the effects of increasing recurrent depth in each time step.

Character-Level Language Modeling with Recurrent Highway Hypernetworks

By combining RHNs and hypernetworks, this work proves that these approaches are complementary and makes a significant improvement over current state-of-the-art character-level language modeling performance on Penn Treebank while relying on much simpler regularization.

Highway State Gating for Recurrent Highway Networks: Improving Information Flow Through Time

A novel and simple variation for the RHN cell, called Highway State Gating (HSG), is introduced, which allows adding more layers, while continuing to improve performance, and empirical results show that the improvement is for all depths, and as the depth increases, the improvement also increases.

From Nodes to Networks: Evolving Recurrent Neural Networks

This paper proposes a new method, evolution of a tree-based encoding of the gated memory nodes, and shows that it makes it possible to explore new variations more effectively than other methods, and discovers nodes with multiple recurrent paths and multiple memory cells, which lead to significant improvement in the standard language modeling benchmark task.

Highway-LSTM and Recurrent Highway Networks for Speech Recognition

Novel Highway-LSTM models with bottlenecks skip connections are experiment with and it is shown that a 10 layer model can outperform a state-of-the-art 5 layer LSTM model with the same number of parameters by 2% relative WER.

Discovering Gated Recurrent Neural Network Architectures

This chapter proposes a new method, evolution of a tree-based encoding of the gated memory nodes, and shows that it makes it possible to explore new variations more effectively than other methods, and discovers nodes with multiple recurrent paths and multiple memory cells, which lead to significant improvement in the standard language modeling benchmark task.

Multiplicative LSTM for sequence modelling

It is demonstrated empirically that mLSTM outperforms standard LSTM and its deep variants for a range of character level language modelling tasks, and is argued makes it more expressive for autoregressive density estimation.
...

References

SHOWING 1-10 OF 58 REFERENCES

Multiplicative LSTM for sequence modelling

It is demonstrated empirically that mLSTM outperforms standard LSTM and its deep variants for a range of character level language modelling tasks, and is argued makes it more expressive for autoregressive density estimation.

Context dependent recurrent neural network language model

This paper improves recurrent neural network language models performance by providing a contextual real-valued input vector in association with each word to convey contextual information about the sentence being modeled by performing Latent Dirichlet Allocation using a block of preceding text.

Exploring the Limits of Language Modeling

This work explores recent advances in Recurrent Neural Networks for large scale Language Modeling, and extends current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language.

An Empirical Exploration of Recurrent Network Architectures

It is found that adding a bias of 1 to the LSTM's forget gate closes the gap between the L STM and the recently-introduced Gated Recurrent Unit (GRU) on some but not all tasks.

Grid Long Short-Term Memory

The Grid LSTM is used to define a novel two-dimensional translation model, the Reencoder, and it is shown that it outperforms a phrase-based reference system on a Chinese-to-English translation task.

Adaptive Computation Time for Recurrent Neural Networks

Performance is dramatically improved and insight is provided into the structure of the data, with more computation allocated to harder-to-predict transitions, such as spaces between words and ends of sentences, which suggests that ACT or other adaptive computation methods could provide a generic method for inferring segment boundaries in sequence data.

LSTM: A Search Space Odyssey

This paper presents the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling, and observes that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

How to Construct Deep Recurrent Neural Networks

Two novel architectures of a deep RNN are proposed which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build aDeep RNN, and an alternative interpretation is provided using a novel framework based on neural operators.

Learning to Forget: Continual Prediction with LSTM

This work identifies a weakness of LSTM networks processing continual input streams that are not a priori segmented into subsequences with explicitly marked ends at which the network's internal state could be reset, and proposes a novel, adaptive forget gate that enables an LSTm cell to learn to reset itself at appropriate times, thus releasing internal resources.

Neural Architecture Search with Reinforcement Learning

This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.
...