Three Factors Influencing Minima in SGD

@article{Jastrzebski2017ThreeFI,
  title={Three Factors Influencing Minima in SGD},
  author={Stanislaw Jastrzebski and Zachary Kenton and Devansh Arpit and Nicolas Ballas and Asja Fischer and Yoshua Bengio and Amos J. Storkey},
  journal={CoRR},
  year={2017},
  volume={abs/1711.04623}
}
We investigate the dynamical and convergent properties of stochastic gradient descent (SGD) applied to Deep Neural Networks (DNNs). Characterizing the relation between learning rate, batch size and the properties of the final minima, such as width or generalization, remains an open question. In order to tackle this problem we investigate the previously proposed approximation of SGD by a stochastic differential equation (SDE). We theoretically argue that three factors learning rate, batch size… CONTINUE READING
Highly Cited
This paper has 21 citations. REVIEW CITATIONS
Recent Discussions
This paper has been referenced on Twitter 13 times over the past 90 days. VIEW TWEETS

7 Figures & Tables

Topics

Statistics

010203020172018
Citations per Year

Citation Velocity: 14

Averaging 14 citations per year over the last 2 years.

Learn more about how we calculate this metric in our FAQ.