# Acceleration via Fractal Learning Rate Schedules

@inproceedings{Agarwal2021AccelerationVF, title={Acceleration via Fractal Learning Rate Schedules}, author={Naman Agarwal and Surbhi Goel and Cyril Zhang}, booktitle={ICML}, year={2021} }

In practical applications of iterative first-order optimization, the learning rate schedule remains notoriously difficult to understand and expensive to tune. We demonstrate the presence of these subtleties even in the innocuous case when the objective is a convex quadratic. We reinterpret an iterative algorithm from the numerical analysis literature as what we call the Chebyshev learning rate schedule for accelerating vanilla gradient descent, and show that the problem of mitigating…

## Figures from this paper

## 2 Citations

Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms

- Mathematics, Computer ScienceArXiv
- 2021

It is proved that the generalization error of a stochastic optimization algorithm can be bounded based on the ‘complexity’ of the fractal structure that underlies its invariant measure.

Super-Acceleration with Cyclical Step-sizes

- Mathematics
- 2021

We develop a convergence-rate analysis of momentum with cyclical step-sizes. We show that under some assumption on the spectral gap of Hessians in machine learning, cyclical step-sizes are provably…

## References

SHOWING 1-10 OF 98 REFERENCES

Super-Convergence with an Unstable Learning Rate

- Computer ScienceArXiv
- 2021

This note introduces a simple scenario where an unstable learning rate scheme leads to a super fast convergence, with the convergence rate depending only logarithmically on the condition number of the problem.

The order of choice of the iteration parameters in the cyclic Chebyshev iteration method

- Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki,
- 1971

Cyclical Learning Rates for Training Neural Networks

- Computer Science2017 IEEE Winter Conference on Applications of Computer Vision (WACV)
- 2017

A new method for setting the learning rate, named cyclical learning rates, is described, which practically eliminates the need to experimentally find the best values and schedule for the global learning rates.

Iterative methods for optimization

- Mathematics, Computer ScienceFrontiers in applied mathematics
- 1999

Iterative Methods for Optimization does more than cover traditional gradient-based optimization: it is the first book to treat sampling methods, including the Hooke& Jeeves, implicit filtering, MDS, and Nelder& Mead schemes in a unified way.

Iterative methods for optimization. SIAM

- 1999

Acceleration Methods

- Mathematics, Computer ScienceFoundations and Trends® in Optimization
- 2021

This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of…

Characterizing Structural Regularities of Labeled Data in Overparameterized Models

- Computer Science, MathematicsICML
- 2021

Two applications using C-scores to help understand the dynamics of representation learning and filter out outliers are concluded, and discussions of other potential applications such as curriculum learning, and active data collection are discussed.

Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability

- Computer Science, MathematicsICLR
- 2021

It is empirically demonstrated that full-batch gradient descent on neural network training objectives typically operates in a regime the authors call the Edge of Stability, which is inconsistent with several widespread presumptions in the field of optimization.

A Fast Anderson-Chebyshev Acceleration for Nonlinear Optimization

- Mathematics, Computer ScienceAISTATS
- 2020

It is shown that Anderson acceleration with Chebyshev polynomial can achieve the optimal convergence rate, which improves the previous result $O(\kappa\ln\frac{1}{\epsilon})$ provided by (Toth and Kelley, 2015) for quadratic functions.

A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent

- Mathematics, Computer ScienceAISTATS
- 2020

A unified analysis of a large family of variants of proximal stochastic gradient descent, which so far have required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities is introduced.