Optimizing Convergence for Iterative Learning of ARIMA for Stationary Time Series

  title={Optimizing Convergence for Iterative Learning of ARIMA for Stationary Time Series},
  author={Kevin Styp-Rekowski and Florian Schmidt and Odej Kao},
  journal={2020 IEEE International Conference on Big Data (Big Data)},
Forecasting of time series in continuous systems becomes an increasingly relevant task due to recent developments in IoT and 5G. The popular forecasting model ARIMA is applied to a large variety of applications for decades. An online variant of ARIMA applies the Online Newton Step in order to learn the underlying process of the time series. This optimization method has pitfalls concerning the computational complexity and convergence. Thus, this work focuses on the computational less expensive… 

Figures from this paper


Online ARIMA Algorithms for Time Series Prediction
This paper proposes online learning algorithms for estimating ARIMA models under relaxed assumptions on the noise terms, which is suitable to a wider range of applications and enjoys high computational efficiency.
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
New variants of Adam and AMSGrad are provided, called AdaBound and AMSBound respectively, which employ dynamic bounds on learning rates to achieve a gradual and smooth transition from adaptive methods to SGD and give a theoretical proof of convergence.
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization
A set of mild sufficient conditions are provided that guarantee the convergence for the Adam-type methods and it is proved that under these derived conditions, these methods can achieve the convergence rate of order $O(\log{T}/\sqrt{T})$ for nonconvex stochastic optimization.
Improving Generalization Performance by Switching from Adam to SGD
SWATS is a hybrid strategy that begins training with an adaptive method and switches to SGD when appropriate and is capable of closing the generalization gap between SGD and Adam on a majority of the tasks.
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
No more pesky learning rates
The proposed method to automatically adjust multiple learning rates so as to minimize the expected error at any one time relies on local gradient variations across samples, making it suitable for non-stationary problems.
On the Convergence Proof of AMSGrad and a New Version
This paper provides an explicit counter-example of a simple convex optimization setting to show the neglected issue in the convergence proof of Adam, and provides a new convergence proof for AMSGrad as the first fix.
Online Importance Weight Aware Updates
This work develops an approach which enjoys an invariance property: that updating twice with importance weight h is equivalent to updating once with importance Weight 2h, and applies this to online active learning yielding an extraordinarily fast active learning algorithm that works even in the presence of adversarial noise.
On the momentum term in gradient descent learning algorithms
  • N. Qian
  • Physics, Computer Science
    Neural Networks
  • 1999