• Corpus ID: 221739153

On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis

@article{Li2020OnTC,
  title={On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis},
  author={Zhong Li and Jiequn Han and E Weinan and Qianxiao Li},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.07799}
}
We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a universal approximation theorem of such linear functionals, and characterize the approximation… 

Figures from this paper

Metric Entropy Limits on Recurrent Neural Network Learning of Linear Dynamical Systems

Path classification by stochastic linear recurrent neural networks

It is argued that these RNNs modelled, in a simplified setting, as a continuous-time stochastic recurrent neural network with the identity activation function are easy to train and robust and a trade-off phenomenon between accuracy and robustness is shown.

Approximation Theory of Convolutional Architectures for Time Series Modelling

The results reveal that in this new setting, approximation efficiency is not only characterised by memory, but also additional fine structures in the target relationship, which leads to a novel definition of spectrum-based regularity that measures the complexity of temporal relationships under the convolutional approximation scheme.

The Discovery of Dynamics via Linear Multistep Methods and Deep Learning: Error Estimation

This work considers the deep network-based LMMs for the discovery of dynamics using the approximation property of deep networks, and indicates, for certain families of LMMs, that the l grid error is bounded by the sum of O(h) and the network approximation error.

Multiscale and Nonlocal Learning for PDEs using Densely Connected RNNs

This work introduces an efficient framework, Densely Connected Recurrent Neural Networks (DC-RNNs), by incorporating a multiscale ansatz and high-order implicit-explicit (IMEX) schemes into RNN structure design to identify analytic representations ofMultiscale and nonlocal PDEs from discrete-time observations generated from heterogeneous experiments.

Calibrating multi-dimensional complex ODE from noisy data via deep neural networks

This work proposes a two-stage nonparametric approach to recover the ODE system without being subject to the curse of dimensionality and complicated ODE structure, and uses this method to simultaneously characterize the growth rate of Covid-19 infection cases from 50 states of the USA.

PGDOT – Perturbed Gradient Descent Adapted with Occupation Time

The proposed algorithm, perturbed gradient descent adapted with occupation time (PGDOT), is shown to converge at least as fast as the PGD algorithm and is guaranteed to avoid getting stuck at saddle points.

On the approximation properties of recurrent encoder-decoder architectures

The theoretical understanding of approximation properties of the recurrent encoder-decoder architecture is provided, which precisely characterises, in the considered setting, the types of temporal relationships that can be efficiently learned.

Deep Neural Network Approximation of Invariant Functions through Dynamical Systems

It is proved sufficient conditions for universal approximation of functions which are invariant with respect to certain permutations of the input indices by a controlled equivariant dynamical system, which can be viewed as a general abstraction of deep residual networks with symmetry constraints.

References

SHOWING 1-10 OF 93 REFERENCES

Understanding Recurrent Neural Networks Using Nonequilibrium Response Theory

  • S. H. Lim
  • Computer Science
    J. Mach. Learn. Res.
  • 2021
This work derives a Volterra type series representation for a class of continuous-time stochastic RNNs (SRNNs) driven by an input signal and shows that the SRNNs can be viewed as kernel machines operating on a reproducing kernel Hilbert space associated with the response feature.

On the Convergence Rate of Training Recurrent Neural Networks

It is shown when the number of neurons is sufficiently large, meaning polynomial in the training data size and in thelinear convergence rate, then SGD is capable of minimizing the regression loss in the linear convergence rate and gives theoretical evidence of how RNNs can memorize data.

AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks

This paper draws connections between recurrent networks and ordinary differential equations and proposes a special form of recurrent networks called AntisymmetricRNN, able to capture long-term dependencies thanks to the stability property of its underlying differential equation.

Interpreting Recurrent Neural Networks Behaviour via Excitable Network Attractors

A novel methodology is proposed that provides a mechanistic interpretation of behaviour when solving a computational task using mathematical constructs called excitable network attractors, which are invariant sets in phase space composed of stable attractors and excitable connections between them.

DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators

This work proposes deep operator networks (DeepONets) to learn operators accurately and efficiently from a relatively small dataset, and demonstrates that DeepONet significantly reduces the generalization error compared to the fully-connected networks.

Model Reduction with Memory and the Machine Learning of Dynamical Systems

A natural analogy between recurrent neural networks and the Mori-Zwanzig formalism is explored to establish a systematic approach for developing reduced models with memory and can produce reduced model with good performance on both short-term prediction and long-term statistical properties.

Improving performance of recurrent neural network with relu nonlinearity

This paper offers a simple dynamical systems perspective on weight initialization process, which allows for a modified weight initialization strategy, and shows that this initialization technique leads to successfully training RNNs composed of ReLUs.

Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN

It is shown that an IndRNN can be easily regulated to prevent the gradient exploding and vanishing problems while allowing the network to learn long-term dependencies and work with non-saturated activation functions such as relu and be still trained robustly.

Long Short-Term Memory

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
...