• Corpus ID: 195345844

Backprop without Learning Rates Through Coin Betting

  title={Backprop without Learning Rates Through Coin Betting},
  author={Francesco Orabona and Tatiana Tommasi},
Deep learning methods achieve state-of-the-art performance in many application scenarios. [] Key Method Contrary to previous methods, we do not adapt the learning rates nor we make use of the assumed curvature of the objective function. Instead, we reduce the optimization process to a game of betting on a coin and propose a learning rate free optimal algorithm for this scenario. Theoretical convergence is proven for convex and quasi-convex functions and empirical evidences show the advantage of our algorithm…

Figures from this paper

Artificial Constraints and Hints for Unbounded Online Learning
These techniques allow us to design algorithms that adapt optimally to the unknown value of ‖u‖ without requiring knowledge of G, and reduce OCO to online linear optimization (OLO) in which each loss `t must be linear through the use of subgradients.
Online Learning for Changing Environments using Coin Betting
A new meta algorithm that has a strongly-adaptive regret bound that is a factor of $\sqrt{\log(T)}$ better than other algorithms with the same time complexity, where $T$ is the time horizon.
Scale-free adaptive planning for deterministic dynamics & discounted rewards
This work introduces PlaTγPOOS, an adaptive, robust, and efficient alternative to the OLOP (open-loop optimistic planning) algorithm, which dynamically adapts its behavior to both rewards and noise and is immune to two vulnerabil-ities of OLOP.
Artificial Constraints and Lipschitz Hints for Unconstrained Online Learning
Algorithms are provided to design algorithms that adapt optimally to the unknown value of $\|u\|$ without requiring knowledge of $G$ and the bounds are polynomial in all quantities.


No more pesky learning rates
The proposed method to automatically adjust multiple learning rates so as to minimize the expected error at any one time relies on local gradient variations across samples, making it suitable for non-stationary problems.
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
On the importance of initialization and momentum in deep learning
It is shown that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs to levels of performance that were previously achievable only with Hessian-Free optimization.
Practical Recommendations for Gradient-Based Training of Deep Architectures
  • Yoshua Bengio
  • Computer Science
    Neural Networks: Tricks of the Trade
  • 2012
Overall, this chapter describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks and closes with open questions about the training difficulties observed with deeper architectures.
Coin Betting and Parameter-Free Online Learning
A new intuitive framework to design parameter-free algorithms for online linear optimization over Hilbert spaces and for learning with expert advice, based on reductions to betting on outcomes of adversarial coins is presented.
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
It is proved that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG), but the analysis is significantly simpler and more intuitive.
Solving large scale linear prediction problems using stochastic gradient descent algorithms
Stochastic gradient descent algorithms on regularized forms of linear prediction methods, related to online algorithms such as perceptron, are studied, and numerical rate of convergence for such algorithms is obtained.
No-Regret Algorithms for Unconstrained Online Convex Optimization
This work presents algorithms that, without prior knowledge, offer near-optimal regret bounds with respect to any choice of ẋ, and proves lower bounds showing that their guarantees are near-Optimal in this setting.
Natural Gradient Works Efficiently in Learning
  • S. Amari
  • Computer Science
    Neural Computation
  • 1998
The dynamical behavior of natural gradient online learning is analyzed and is proved to be Fisher efficient, implying that it has asymptotically the same performance as the optimal batch estimation of parameters.