Corpus ID: 218596019

# Convergence of Online Adaptive and Recurrent Optimization Algorithms

@article{Masse2020ConvergenceOO,
title={Convergence of Online Adaptive and Recurrent Optimization Algorithms},
author={Pierre-Yves Mass'e and Yann Ollivier},
journal={arXiv: Dynamical Systems},
year={2020}
}
• Published 11 May 2020
• Mathematics
• arXiv: Dynamical Systems
We prove local convergence of several notable gradient descent algorithms used in machine learning, for which standard stochastic gradient descent theory does not apply. This includes, first, online algorithms for recurrent models and dynamical systems, such as \emph{Real-time recurrent learning} (RTRL) and its computationally lighter approximations NoBackTrack and UORO; second, several adaptive algorithms such as RMSProp, online natural gradient, and Adam with $\beta^2\to 1$. Despite local… Expand
1 Citations
Prediction of the Position of External Markers Using a Recurrent Neural Network Trained With Unbiased Online Recurrent Optimization for Safe Lung Cancer Radiotherapy
• Computer Science, Engineering
• ArXiv
• 2021
This research uses nine observation records of the three-dimensional position of three external markers on the chest and abdomen of healthy individuals breathing during intervals from 73s to 222s to compare its performance with an RNN trained with real-time recurrent learning, least mean squares (LMS), and offline linear regression. Expand

#### References

SHOWING 1-10 OF 26 REFERENCES
On the Convergence of Adam and Beyond
• Computer Science, Mathematics
• ICLR
• 2018
It is shown that one cause for such failures is the exponential moving average used in the algorithms, and suggested that the convergence issues can be fixed by endowing such algorithms with `long-term memory' of past gradients. Expand
Unbiased Online Recurrent Optimization
• Computer Science
• ICLR
• 2018
The novel Unbiased Online Recurrent Optimization (UORO) algorithm allows for online learning of general recurrent computational graphs such as recurrent network models and performs well thanks to the unbiasedness of its gradients. Expand
Adam: A Method for Stochastic Optimization
• Computer Science, Mathematics
• ICLR
• 2015
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand
Training recurrent networks online without backtracking
• Computer Science, Mathematics
• ArXiv
• 2015
Preliminary tests on a simple task show that the stochastic approximation of the gradient introduced in the algorithm does not seem to introduce too much noise in the trajectory, compared to maintaining the full gradient, and confirm the good performance and scalability of the Kalman-like version of NoBackTrack. Expand
Approximating Real-Time Recurrent Learning with Random Kronecker Factors
• Computer Science, Mathematics
• NeurIPS
• 2018
It is shown that KF-RTRL is an unbiased and memory efficient online learning algorithm that captures long-term dependencies and almost matches the performance of TBPTT on real world tasks by training Recurrent Highway Networks on a synthetic string memorization task and on the Penn TreeBank task, respectively. Expand
Gradient calculations for dynamic recurrent neural networks: a survey
The author discusses advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones and presents some "tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks. Expand
Why random reshuffling beats stochastic gradient descent
• Computer Science, Mathematics
• Math. Program.
• 2021
The convergence rate of the random reshuffling method is analyzed and it is shown that when the component functions are quadratics or smooth and the sum function is strongly convex, RR with iterate averaging and a diminishing stepsize converges at rate $\Theta(1/k^{2s})$ with probability one in the suboptimality of the objective value, thus improving upon the $\Omega( 1/k)$ rate of SGD. Expand
Curiously Fast Convergence of some Stochastic Gradient Descent Algorithms
1 Context Given a finite set of m examples z 1 ,. .. , z m and a strictly convex differen-tiable loss function ℓ(z, θ) defined on a parameter vector θ ∈ R d , we are interested in minimizing the costExpand
The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
• Mathematics, Computer Science
• SIAM J. Control. Optim.
• 2000
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated ODE. This in turn implies convergence of theExpand
Gradient Descent Learns Linear Dynamical Systems
• Computer Science, Mathematics
• J. Mach. Learn. Res.
• 2018
We prove that gradient descent efficiently converges to the global optimizer of the maximum likelihood objective of an unknown linear time-invariant dynamical system from a sequence of noisyExpand