• Corpus ID: 219686925

Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting

@article{Chen2020BetterPS,
  title={Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting},
  author={K. Chen and John Langford and Francesco Orabona},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.07507}
}
Parameter-free stochastic gradient descent (PFSGD) algorithms do not require setting learning rates while achieving optimal theoretical performance. In practical applications, however, there remains an empirical gap between tuned stochastic gradient descent (SGD) and PFSGD. In this paper, we close the empirical gap with a new parameter-free algorithm based on continuous-time Coin-Betting on truncated models. The new update is derived through the solution of an Ordinary Differential Equation… 

Figures and Tables from this paper

Implicit Parameter-free Online Learning with Truncated Linear Models
TLDR
New parameter-free algorithms that can take advantage of truncated linear models through a new update that has an “implicit” flavor are proposed that are efficient, efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter- free properties.
Making SGD Parameter-Free
We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding
Optimal Parameter-free Online Learning with Switching Cost
TLDR
A simple yet powerful algorithm for Online Linear Optimization (OLO) with switching cost is proposed, which improves the existing suboptimal regret bound [ZCP22a] to the optimal rate.
PDE-Based Optimal Strategy for Unconstrained Online Learning
TLDR
The proposed algorithm is the first to achieve an optimal loss-regret trade-off without the impractical doubling trick and a matching lower bound shows that the leading order term, including the constant multiplier √ 2, is tight.
Learning to Accelerate by the Methods of Step-size Planning
TLDR
It is shown that for a convex problem, the methods surpass the convergence rate of Nesterov’s accelerated gradient, 1 − (cid:112) µL , where µ, L are the strongly convex factor of the loss function F and the Lipschitz constant of F ( cid:48) , which is the theoretical limit for the converge rate of first-order methods.

References

SHOWING 1-10 OF 34 REFERENCES
Implicit Parameter-free Online Learning with Truncated Linear Models
TLDR
New parameter-free algorithms that can take advantage of truncated linear models through a new update that has an “implicit” flavor are proposed that are efficient, efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter- free properties.
Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning
TLDR
This paper proposes a new kernel-based stochastic gradient descent algorithm that performs model selection while training, with no parameters to tune, nor any form of cross-validation, to estimate over time the right regularization in a data-dependent way.
Training Deep Networks without Learning Rates Through Coin Betting
TLDR
This paper proposes a new stochastic gradient descent procedure for deep networks that does not require any learning rate setting and reduces the optimization process to a game of betting on a coin and proposes a learning-rate-free optimal algorithm.
Coin Betting and Parameter-Free Online Learning
TLDR
A new intuitive framework to design parameter-free algorithms for online linear optimization over Hilbert spaces and for learning with expert advice, based on reductions to betting on outcomes of adversarial coins is presented.
Black-Box Reductions for Parameter-free Online Learning in Banach Spaces
We introduce several new black-box reductions that significantly improve the design of adaptive and parameter-free online learning algorithms by simplifying analysis, improving regret guarantees, and
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
TLDR
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
Parameter-free Stochastic Optimization of Variationally Coherent Functions
TLDR
This work designs and analyzes an algorithm for first-order stochastic optimization of a large class of functions on R which is an instance of the Follow The Regularized Leader algorithm with the added twist of using rescaled gradients and time-varying linearithmic regularizers.
Online Learning Without Prior Information
TLDR
This work describes a frontier of new lower bounds on the performance of optimization and online learning algorithms, reflecting a tradeoff between a term that depends on the optimal parameter value and a terms that depend on the gradients' rate of growth.
Adaptive scale-invariant online algorithms for learning linear models
TLDR
This paper proposes online algorithms making predictions which are invariant under arbitrary rescaling of the features, which achieve regret bounds matching that of OGD with optimally tuned separate learning rates per dimension, while retaining comparable runtime performance.
Lipschitz and Comparator-Norm Adaptivity in Online Learning
TLDR
Two prior reductions to the unbounded setting are generalized; one to not need hints, and a second to deal with the range ratio problem (which already arises in prior work).
...
...