Highway State Gating for Recurrent Highway Networks: Improving Information Flow Through Time

@inproceedings{Shoham2018HighwaySG,
  title={Highway State Gating for Recurrent Highway Networks: Improving Information Flow Through Time},
  author={Ron Shoham and Haim H. Permuter},
  booktitle={CSCML},
  year={2018}
}
Recurrent Neural Networks (RNNs) play a major role in the field of sequential learning, and have outperformed traditional algorithms on many benchmarks. Training deep RNNs still remains a challenge, and most of the state-of-the-art models are structured with a transition depth of 2–4 layers. Recurrent Highway Networks (RHNs) were introduced in order to tackle this issue. These have achieved state-of-the-art performance on a few benchmarks using a depth of 10 layers. However, the performance of… 

References

SHOWING 1-10 OF 19 REFERENCES
Recurrent Highway Networks
TLDR
A novel theoretical analysis of recurrent networks based on Gersgorin's circle theorem is introduced that illuminates several modeling and optimization issues and improves the understanding of the LSTM cell.
Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition
TLDR
A novel architecture for a deep recurrent neural network, residual LSTM is introduced, which separates a spatial shortcut path with temporal one by using output layers, which can help to avoid a conflict between spatial and temporal-domain gradient flows.
Densely Connected Convolutional Networks
TLDR
The Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion, and has several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Revisiting Activation Regularization for Language RNNs
TLDR
Traditional regularization techniques are revisited, specifically L2 regularization on RNN activations and slowness regularization over successive hidden states, to improve the performance of RNNs on the task of language modeling.
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
TLDR
This work applies a new variational inference based dropout technique in LSTM and GRU models, which outperforms existing techniques, and to the best of the knowledge improves on the single model state-of-the-art in language modelling with the Penn Treebank.
Recurrent Residual Learning for Sequence Classification
TLDR
It is shown that for sequence classification tasks, incorporating residual connections into recurrent structures yields similar accuracy to Long Short Term Memory (LSTM) RNN with much fewer model parameters.
Long Short-Term Memory
TLDR
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
TLDR
This work proposes zoneout, a novel method for regularizing RNNs that uses random noise to train a pseudo-ensemble, improving generalization and performs an empirical investigation of various RNN regularizers, and finds that zoneout gives significant performance improvements across tasks.
On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures
TLDR
A new measure based on topological concepts is introduced, aimed at evaluating the complexity of the function implemented by a neural network, used for classification purposes, and results seem to support the idea that deep networks actually implements functions of higher complexity, so that they are able, with the same number of resources, to address more difficult problems.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
...
...