Corpus ID: 44111521

Approximating Real-Time Recurrent Learning with Random Kronecker Factors

@article{Mujika2018ApproximatingRR,
  title={Approximating Real-Time Recurrent Learning with Random Kronecker Factors},
  author={Asier Mujika and Florian Meier and A. Steger},
  journal={ArXiv},
  year={2018},
  volume={abs/1805.10842}
}
Despite all the impressive advances of recurrent neural networks, sequential data is still in need of better modelling. Truncated backpropagation through time (TBPTT), the learning algorithm most widely used in practice, suffers from the truncation bias, which drastically limits its ability to learn long-term dependencies.The Real-Time Recurrent Learning algorithm (RTRL) addresses this issue, but its high computational requirements make it infeasible in practice. The Unbiased Online Recurrent… Expand
Optimal Kronecker-Sum Approximation of Real Time Recurrent Learning
TLDR
A new approximation algorithm of RTRL, Optimal Kronecker-Sum Approximation (OK), is presented and it is proved that OK is optimal for a class of approximations of R TRL, which includes all approaches published so far. Expand
Practical Real Time Recurrent Learning with a Sparse Approximation
TLDR
For highly sparse networks, SnAp with n = 2 remains tractable and can outperform backpropagation through time in terms of learning speed when updates are done online, and substantially outperforms other RTRL approximations with comparable costs such as Unbiased Online Recurrent Optimization. Expand
A Practical Sparse Approximation for Real Time Recurrent Learning
TLDR
The Sparse n-step Approximation (SnAp) is introduced to the RTRL influence matrix, which only keeps entries that are nonzero within n steps of the recurrent core, and substantially outperforms other R TRL approximations with comparable costs such as Unbiased Online Recurrent Optimization. Expand
Training Recurrent Neural Networks Online by Learning Explicit State Variables
TLDR
This work reformulate the RNN training objective to explicitly learn state vectors; this breaks the dependence across time and so avoids the need to estimate gradients far back in time. Expand
Adaptively Truncating Backpropagation Through Time to Control Gradient Bias
TLDR
An adaptive TBPTT scheme is proposed that converts the problem from choosing a temporal lag to one of choosing a tolerable amount of gradient bias and it is proved that this bias controls the convergence rate of SGD with biased gradients for the authors' non-convex loss. Expand
Local online learning in recurrent networks with random feedback
TLDR
This work derives an approximation to gradient-based learning that comports with known biological features of the brain, such as causality and locality, and proposes an augmented circuit architecture that allows the RNN to concatenate short-duration patterns into sequences of longer duration. Expand
Training Recurrent Neural Networks via Forward Propagation Through Time
TLDR
Empirically FPTT outperforms BPTT on a number of well-known benchmark tasks, thus enabling architectures like LSTMs to solve long range dependencies problems, and considers both sequence-to-sequence as well as terminal loss problems. Expand
Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations
TLDR
Results are presented that show the P-TNCN’s ability to conduct zero-shot adaptation and online continual sequence modeling and can, in some instances, outperform full BPTT as well as variants such as sparse attentive backtracking. Expand
Online Learning of Recurrent Neural Architectures by Locally Aligning Distributed Representations
TLDR
This work proposes the Parallel Temporal Neural Coding Network, a biologically inspired model trained by the local learning algorithm known as Local Representation Alignment that aims to resolve the difficulties and problems that plague recurrent networks trained by back-propagation through time. Expand
General Value Function Networks
TLDR
This work forms a novel RNN architecture, called a General Value Function Network (GVFN), where each internal state component corresponds to a prediction about the future represented as a value function, and shows that GVFNs are more robust to the truncation level, in many cases only requiring one-step gradient updates. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 23 REFERENCES
Training recurrent networks online without backtracking
TLDR
Preliminary tests on a simple task show that the stochastic approximation of the gradient introduced in the algorithm does not seem to introduce too much noise in the trajectory, compared to maintaining the full gradient, and confirm the good performance and scalability of the Kalman-like version of NoBackTrack. Expand
Unbiased Online Recurrent Optimization
TLDR
The novel Unbiased Online Recurrent Optimization (UORO) algorithm allows for online learning of general recurrent computational graphs such as recurrent network models and performs well thanks to the unbiasedness of its gradients. Expand
Long Short-Term Memory
TLDR
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Expand
A method for improving the real-time recurrent learning algorithm
TLDR
An improved implementation of the Real-Time Recurrent Learning algorithm is described, which makes it possible to increase the performance of the learning algorithm during the training phase by using some a priori knowledge about the temporal necessities of the problem. Expand
Unbiasing Truncated Backpropagation Through Time
TLDR
Anticipated Reweighted Truncated Backpropagation (ARTBP), an algorithm that keeps the computational benefits of truncated BPTT, while providing unbiasedness, is introduced. Expand
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks
The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporalExpand
Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations
TLDR
A new computational model for real-time computing on time-varying input that provides an alternative to paradigms based on Turing machines or attractor neural networks, based on principles of high-dimensional dynamical systems in combination with statistical learning theory and can be implemented on generic evolved or found recurrent circuitry. Expand
Adam: A Method for Stochastic Optimization
TLDR
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand
An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories
A novel variant of the familiar backpropagation-through-time approach to training recurrent networks is described. This algorithm is intended to be used on arbitrary recurrent networks that runExpand
Learning long-term dependencies with gradient descent is difficult
TLDR
This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods. Expand
...
1
2
3
...