Advances in optimizing recurrent networks

@article{Bengio2013AdvancesIO,
  title={Advances in optimizing recurrent networks},
  author={Yoshua Bengio and Nicolas Boulanger-Lewandowski and Razvan Pascanu},
  journal={2013 IEEE International Conference on Acoustics, Speech and Signal Processing},
  year={2013},
  pages={8624-8628}
}
After a more than decade-long period of relatively little research activity in the area of recurrent neural networks, several new developments will be reviewed here that have allowed substantial progress both in understanding and in technical solutions towards more efficient training of recurrent networks. These advances have been motivated by and related to the optimization issues surrounding deep learning. Although recurrent networks are extremely powerful in what they can in principle… Expand
Learning Multiple Timescales in Recurrent Neural Networks
TLDR
The results show that partitioning hidden layers under distinct temporal constraints enables the learning of multiple timescales, which contributes to the understanding of the fundamental conditions that allow RNNs to self-organize to accurate temporal abstractions. Expand
RECURRENT NEURAL NETWORKS
Recurrent neural network is a powerful model that learns temporal patterns in sequential data. For a long time, it was believed that recurrent networks are difficult to train using simple optimizers,Expand
Learning Longer Memory in Recurrent Neural Networks
TLDR
This paper shows that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent, by using a slight structural modification of the simple recurrent neural network architecture. Expand
On Fast Dropout and its Applicability to Recurrent Networks
TLDR
This paper analyzes fast dropout, a recent regularization method for generalized linear models and neural networks from a back-propagation inspired perspective and shows that it implements a quadratic form of an adaptive, per-parameter regularizer, which rewards large weights in the light of underfitting, penalizes them for overconfident predictions and vanishes at minima of an unregularized training loss. Expand
Residual Recurrent Neural Networks for Learning Sequential Representations
TLDR
The results show that the RNN unit reformulate to learn the residual functions with reference to the hidden state gives state-of-the-art performance, outperforms LSTM and GRU layers in terms of speed, and supports an accuracy competitive with that of the other methods. Expand
Recent Advances in Recurrent Neural Networks
TLDR
A survey on RNNs and several new advances for newcomers and professionals in the field are presented and the research challenges are introduced. Expand
Sampling-Based Gradient Regularization for Capturing Long-Term Dependencies in Recurrent Neural Networks
TLDR
An analytical framework is constructed to estimate a contribution of each training example to the norm of the long-term components of the target functions gradient and use it to hold thenorm of the gradients in the suitable range for stochastic gradient descent SGD training. Expand
Regularizing Recurrent Networks - On Injected Noise and Norm-based Methods
TLDR
Evidence is concluded that training with noise does not improve performance as conjectured by a few works in RNN optimization before the authors', and the remembering and generalization ability of RNNs on polyphonic musical datasets is evaluated. Expand
Conditional Computation in Deep and Recurrent Neural Networks
TLDR
Two cases of conditional computation are explored – in the feed forward case, a technique is developed that trades off accuracy for potential computational benefits, and in the recurrent case, techniques that yield practical speed benefits on a language modeling task are demonstrated. Expand
A Critical Review of Recurrent Neural Networks for Sequence Learning
TLDR
The goal of this survey is to provide a selfcontained explication of the state of the art of recurrent neural networks together with a historical perspective and references to primary research. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 53 REFERENCES
Learning Recurrent Neural Networks with Hessian-Free Optimization
TLDR
This work solves the long-outstanding problem of how to effectively train recurrent neural networks on complex and difficult sequence modeling problems which may contain long-term data dependencies and offers a new interpretation of the generalized Gauss-Newton matrix of Schraudolph which is used within the HF approach of Martens. Expand
Extensions of recurrent neural network language model
TLDR
Several modifications of the original recurrent neural network language model are presented, showing approaches that lead to more than 15 times speedup for both training and testing phases and possibilities how to reduce the amount of parameters in the model. Expand
Training recurrent neural networks
TLDR
A new probabilistic sequence model that combines Restricted Boltzmann Machines and RNNs is described, more powerful than similar models while being less difficult to train, and a random parameter initialization scheme is described that allows gradient descent with momentum to train Rnns on problems with long-term dependencies. Expand
Learning long-term dependencies with gradient descent is difficult
TLDR
This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods. Expand
Greedy Layer-Wise Training of Deep Networks
TLDR
These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization. Expand
Temporal-Kernel Recurrent Neural Networks
TLDR
The Temporal-Kernel Recurrent Neural Network is introduced, which is a variant of the RNN that can cope with long-term dependencies much more easily than a standard RNN, and it is shown that the TKRNN develops short-term memory that successfully solves the serial recall task by representing the input string with a stable state of its hidden units. Expand
Context dependent recurrent neural network language model
TLDR
This paper improves recurrent neural network language models performance by providing a contextual real-valued input vector in association with each word to convey contextual information about the sentence being modeled by performing Latent Dirichlet Allocation using a block of preceding text. Expand
Understanding the exploding gradient problem
TLDR
The analysis is used to justify the simple yet effective solution of norm clipping the exploded gradient, and the comparison between this heuristic solution and standard SGD provides empirical evidence towards the hypothesis that such a heuristic is required to reach state of the art results on a character prediction task and a polyphonic music prediction one. Expand
Hierarchical Recurrent Neural Networks for Long-Term Dependencies
TLDR
This paper proposes to use a more general type of a-priori knowledge, namely that the temporal dependencies are structured hierarchically, which implies that long-term dependencies are represented by variables with a long time scale. Expand
Why Does Unsupervised Pre-training Help Deep Learning?
TLDR
The results suggest that unsupervised pre-training guides the learning towards basins of attraction of minima that support better generalization from the training data set; the evidence from these results supports a regularization explanation for the effect of pre- training. Expand
...
1
2
3
4
5
...