# Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity

@article{Zhang2020WhyGC, title={Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity}, author={J. Zhang and Tianxing He and S. Sra and A. Jadbabaie}, journal={arXiv: Optimization and Control}, year={2020} }

We provide a theoretical explanation for the effectiveness of gradient clipping in training deep neural networks. The key ingredient is a new smoothness condition derived from practical neural network training examples. We observe that gradient smoothness, a concept central to the analysis of first-order optimization algorithms that is often assumed to be a constant, demonstrates significant variability along the training trajectory of deep neural networks. Further, this smoothness positively… CONTINUE READING

#### Supplemental Code

GITHUB REPO

Via Papers with Code

A pytorch implementation for the LSTM experiments in the paper: Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity

23 Citations

Stochastic Normalized Gradient Descent with Momentum for Large Batch Training

- Computer Science, Mathematics
- 2020

Understanding the Role of Adversarial Regularization in Supervised Learning

- Computer Science, Mathematics
- 2020

Autoclip: Adaptive Gradient Clipping for Source Separation Networks

- Engineering, Computer Science
- 2020

- 2
- PDF

Characterizing Private Clipped Gradient Descent on Convex Generalized Linear Problems

- Computer Science, Mathematics
- 2020

- 6
- PDF

Convergence Rates of a Momentum Algorithm with Bounded Adaptive Step Size for Nonconvex Optimization

- 2020

Fast AHRS Filter for Accelerometer, Magnetometer, and Gyroscope Combination with Separated Sensor Corrections

- Computer Science, Medicine
- 2020

#### References

##### Publications referenced by this paper.

SHOWING 1-10 OF 61 REFERENCES

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

- Computer Science, Mathematics
- 2010

- 6,012
- Highly Influential
- PDF

Lower bounds for finding stationary points I

- Computer Science, Mathematics
- 2020

- 92
- Highly Influential
- PDF

A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

- Computer Science, Mathematics
- 2009

- 7,667
- PDF

A Proximal Stochastic Gradient Method with Progressive Variance Reduction

- Computer Science, Mathematics
- 2014

- 528
- PDF