Corpus ID: 3516266

Don't Decay the Learning Rate, Increase the Batch Size

@article{Smith2018DontDT,
  title={Don't Decay the Learning Rate, Increase the Batch Size},
  author={Samuel L. Smith and Pieter-Jan Kindermans and Quoc V. Le},
  journal={ArXiv},
  year={2018},
  volume={abs/1711.00489}
}
  • Samuel L. Smith, Pieter-Jan Kindermans, Quoc V. Le
  • Published 2018
  • Mathematics, Computer Science
  • ArXiv
  • It is common practice to decay the learning rate. Here we show one can usually obtain the same learning curve on both training and test sets by instead increasing the batch size during training. This procedure is successful for stochastic gradient descent (SGD), SGD with momentum, Nesterov momentum, and Adam. It reaches equivalent test accuracies after the same number of training epochs, but with fewer parameter updates, leading to greater parallelism and shorter training times. We can further… CONTINUE READING

    Figures and Topics from this paper.

    Explore key concepts

    Links to highly relevant papers for key concepts in this paper:

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 341 CITATIONS

    Revisiting Small Batch Training for Deep Neural Networks

    VIEW 2 EXCERPTS
    CITES METHODS

    Inefficiency of K-FAC for Large Batch Size Training

    VIEW 3 EXCERPTS
    CITES BACKGROUND & METHODS

    Scalable and Practical Natural Gradient for Large-Scale Deep Learning

    VIEW 1 EXCERPT
    CITES METHODS

    Stagewise Enlargement of Batch Size for SGD-based Learning

    VIEW 5 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    FILTER CITATIONS BY YEAR

    2017
    2020

    CITATION STATISTICS

    • 30 Highly Influenced Citations

    • Averaged 112 Citations per year from 2018 through 2020

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 33 REFERENCES

    Scaling SGD Batch Size to 32K for ImageNet Training

    VIEW 2 EXCERPTS
    HIGHLY INFLUENTIAL

    Large Batch Training of Convolutional Networks

    VIEW 2 EXCERPTS
    HIGHLY INFLUENTIAL

    ImageNet Training in Minutes

    VIEW 2 EXCERPTS
    HIGHLY INFLUENTIAL