# Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity

@article{Zhang2020WhyGC, title={Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity}, author={J. Zhang and Tianxing He and S. Sra and A. Jadbabaie}, journal={arXiv: Optimization and Control}, year={2020} }

We provide a theoretical explanation for the effectiveness of gradient clipping in training deep neural networks. The key ingredient is a new smoothness condition derived from practical neural network training examples. We observe that gradient smoothness, a concept central to the analysis of first-order optimization algorithms that is often assumed to be a constant, demonstrates significant variability along the training trajectory of deep neural networks. Further, this smoothness positively… CONTINUE READING

#### Supplemental Code

GITHUB REPO

Via Papers with Code

A pytorch implementation for the LSTM experiments in the paper: Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity

