# Decoupling Hierarchical Recurrent Neural Networks With Locally Computable Losses

@article{Mujika2019DecouplingHR, title={Decoupling Hierarchical Recurrent Neural Networks With Locally Computable Losses}, author={Asier Mujika and Felix Weissenberger and A. Steger}, journal={ArXiv}, year={2019}, volume={abs/1910.05245} }

Learning long-term dependencies is a key long-standing challenge of recurrent neural networks (RNNs). Hierarchical recurrent neural networks (HRNNs) have been considered a promising approach as long-term dependencies are resolved through shortcuts up and down the hierarchy. Yet, the memory requirements of Truncated Backpropagation Through Time (TBPTT) still prevent training them on very long sequences. In this paper, we empirically show that in (deep) HRNNs, propagating gradients back from… Expand

#### References

SHOWING 1-10 OF 35 REFERENCES

Learning long-term dependencies with gradient descent is difficult

- Computer Science, Medicine
- IEEE Trans. Neural Networks
- 1994

This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods. Expand

A Clockwork RNN

- Computer Science
- ICML
- 2014

This paper introduces a simple, yet powerful modification to the simple RNN architecture, the Clockwork RNN (CW-RNN), in which the hidden layer is partitioned into separate modules, each processing inputs at its own temporal granularity, making computations only at its prescribed clock rate. Expand

Hierarchical Recurrent Neural Networks for Long-Term Dependencies

- Computer Science
- NIPS
- 1995

This paper proposes to use a more general type of a-priori knowledge, namely that the temporal dependencies are structured hierarchically, which implies that long-term dependencies are represented by variables with a long time scale. Expand

Learning Recurrent Neural Networks with Hessian-Free Optimization

- Computer Science
- ICML
- 2011

This work solves the long-outstanding problem of how to effectively train recurrent neural networks on complex and difficult sequence modeling problems which may contain long-term data dependencies and offers a new interpretation of the generalized Gauss-Newton matrix of Schraudolph which is used within the HF approach of Martens. Expand

Hierarchical Multiscale Recurrent Neural Networks

- Computer Science, Mathematics
- ICLR
- 2017

A novel multiscale approach, called the hierarchical multiscales recurrent neural networks, which can capture the latent hierarchical structure in the sequence by encoding the temporal dependencies with different timescales using a novel update mechanism is proposed. Expand

Long Short-Term Memory

- Computer Science, Medicine
- Neural Computation
- 1997

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Expand

Optimal Kronecker-Sum Approximation of Real Time Recurrent Learning

- Computer Science, Mathematics
- ICML
- 2019

A new approximation algorithm of RTRL, Optimal Kronecker-Sum Approximation (OK), is presented and it is proved that OK is optimal for a class of approximations of R TRL, which includes all approaches published so far. Expand

Learning Complex, Extended Sequences Using the Principle of History Compression

- Computer Science
- Neural Computation
- 1992

A simple principle for reducing the descriptions of event sequences without loss of information is introduced and this insight leads to the construction of neural architectures that learn to divide and conquer by recursively decomposing sequences. Expand

Z-Forcing: Training Stochastic Recurrent Networks

- Computer Science, Mathematics
- NIPS
- 2017

This work unify successful ideas from recently proposed architectures into a stochastic recurrent model that achieves state-of-the-art results on standard speech benchmarks such as TIMIT and Blizzard and competitive performance on sequential MNIST. Expand

The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions

- Mathematics, Computer Science
- Int. J. Uncertain. Fuzziness Knowl. Based Syst.
- 1998

The de-caying error flow is theoretically analyzed, methods trying to overcome vanishing gradients are briefly discussed, and experiments comparing conventional algorithms and alternative methods are presented. Expand