• Corpus ID: 1463401

Hierarchical Multiscale Recurrent Neural Networks

  title={Hierarchical Multiscale Recurrent Neural Networks},
  author={Junyoung Chung and Sungjin Ahn and Yoshua Bengio},
Learning both hierarchical and temporal representation has been among the long-standing challenges of recurrent neural networks. Multiscale recurrent neural networks have been considered as a promising approach to resolve this issue, yet there has been a lack of empirical evidence showing that this type of models can actually capture the temporal dependencies by discovering the latent hierarchical structure of the sequence. In this paper, we propose a novel multiscale approach, called the… 

Figures and Tables from this paper

Learning to Adaptively Scale Recurrent Neural Networks

Adaptively Scaled Recurrent Neural Networks (ASRNN) are proposed, a simple but efficient way to handle multiscale RNNs that can efficiently adapt scales based on different sequence contexts and yield better performances than baselines without dynamical scaling abilities.

Continuous Learning in a Hierarchical Multiscale Neural Network

A hierarchical multi-scale language model in which short time-scale dependencies are encoded in the hidden state of a lower-level recurrent neural network by having a meta-learner update the weights of the lower- level neural network in an online meta-learning fashion.

Learning deep hierarchical and temporal recurrent neural networks with residual learning

It is proved that approximating identity mapping is crucial for optimizing both hierarchical and temporal structures and even for large datasets exploiting parameters for increasing network depth can gain computational benefits with reduced size of the RNN "state".

Recurrent Neural Networks with Mixed Hierarchical Structures and EM Algorithm for Natural Language Processing

A novel approach to identify and learn implicit hierarchical information (e.g., phrases) is proposed, and an EM algorithm is developed to handle the latent indicator layer in training, which further simplifies a text’s hierarchical structure.

Representing Compositionality based on Multiple Timescales Gated Recurrent Neural Networks with Adaptive Temporal Hierarchy for Character-Level Language Models

The experiments show that the use of multiple timescales in a Neural Language Model (NLM) enables improved performance despite having fewer parameters and with no additional computation requirements.

Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory

Experimental results on synthetic and real-world datasets on speech recognition and handwritten characters show that the modular architecture and the incremental training algorithm improve the ability of recurrent neural networks to capture long-term dependencies.

Decoupling Hierarchical Recurrent Neural Networks With Locally Computable Losses

It is empirically show that in (deep) HRNNs, propagating gradients back from higher to lower levels can be replaced by locally computable losses, without harming the learning capability of the network, over a wide range of tasks.

Temporal Pyramid Recurrent Neural Network

Experimental results demonstrate that TP-RNN consistently outperforms existing RNNs for learning long-term and multi-scale dependencies in sequential data.

Stochastic Sequential Neural Networks with Structured Inference

This work proposes a structured and stochastic sequential neural network, which models both the long-term dependencies via recurrent neural networks and the uncertainty in the segmentation and labels via discrete random variables, and presents a bi-directional inference network.

Residual Recurrent Highway Networks for Learning Deep Sequence Prediction Models

R2HN is proposed that contains highways within temporal structure of the network for unimpeded information propagation, thus alleviating gradient vanishing problem and posing as residual learning framework to prevent performance degradation problem.



Hierarchical Recurrent Neural Networks for Long-Term Dependencies

This paper proposes to use a more general type of a-priori knowledge, namely that the temporal dependencies are structured hierarchically, which implies that long-term dependencies are represented by variables with a long time scale.

Sequence Labelling in Structured Domains with Hierarchical Recurrent Neural Networks

This paper presents a hierarchical system, based on the connectionist temporal classification algorithm, for labelling unsegmented sequential data at multiple scales with recurrent neural networks only and shows that the system outperforms hidden Markov models, while making fewer assumptions about the domain.

Architectural Complexity Measures of Recurrent Neural Networks

This paper proposes three architecture complexity measures of RNNs and rigorously proves each measure's existence and computability, and demonstrates that increasing recurrent skip coefficient offers performance boosts on long term dependency problems.

Gated Feedback Recurrent Neural Networks

The empirical evaluation of different RNN units revealed that the proposed gated-feedback RNN outperforms the conventional approaches to build deep stacked RNNs in the tasks of character-level language modeling and Python program evaluation.

On Multiplicative Integration with Recurrent Neural Networks

This work introduces a general and simple structural design called Multiplicative Integration, which changes the way in which information from difference sources flows and is integrated in the computational building block of an RNN, while introducing almost no extra parameters.

Learning long-term dependencies in NARX recurrent neural networks

It is shown that the long-term dependencies problem is lessened for a class of architectures called nonlinear autoregressive models with exogenous (NARX) recurrent neural networks, which have powerful representational capabilities.

Recurrent Highway Networks

A novel theoretical analysis of recurrent networks based on Gersgorin's circle theorem is introduced that illuminates several modeling and optimization issues and improves the understanding of the LSTM cell.

Neural Sequence Chunkers

Experiments show that chunking systems can be superior to the conventional training algorithms for recurrent nets, and a focus is on a class of 2-network systems which try to collapse a self-organizing hierarchy of temporal predictors into a single recurrent network.

On the difficulty of training recurrent neural networks

This paper proposes a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem and validates empirically the hypothesis and proposed solutions.

Segmental Recurrent Neural Networks

Experiments on handwriting recognition and joint Chinese word segmentation/POS tagging show that segmental recurrent neural networks obtain substantially higher accuracies compared to models that do not explicitly represent segments.