Corpus ID: 4714223

A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay

@article{Smith2018ADA,
  title={A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay},
  author={Leslie N. Smith},
  journal={ArXiv},
  year={2018},
  volume={abs/1803.09820}
}
  • Leslie N. Smith
  • Published in ArXiv 2018
  • Mathematics, Computer Science
  • Although deep learning has produced dazzling successes for applications of image, speech, and video processing in the past few years, most trainings are with suboptimal hyper-parameters, requiring unnecessarily long training times. Setting the hyper-parameters remains a black art that requires years of experience to acquire. This report proposes several efficient ways to set the hyper-parameters that significantly reduce training time and improves performance. Specifically, this report shows… CONTINUE READING

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 108 CITATIONS

    Amplifying the Analyst: Machine Learning Approaches for Buried Utility Characterization

    VIEW 5 EXCERPTS
    CITES BACKGROUND
    HIGHLY INFLUENCED

    Determining Political Inclination in Tweets Using Transfer Learning

    VIEW 5 EXCERPTS
    CITES METHODS
    HIGHLY INFLUENCED

    Mish: A Self Regularized Non-Monotonic Neural Activation Function

    VIEW 3 EXCERPTS
    CITES METHODS
    HIGHLY INFLUENCED

    Product Categorization with LSTMs and Balanced Pooling Views

    VIEW 4 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

    VIEW 5 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    Combining learning rate decay and weight decay with complexity gradient descent - Part I

    VIEW 3 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    MultiFiT: Efficient Multi-lingual Language Model Fine-tuning

    VIEW 2 EXCERPTS
    CITES METHODS
    HIGHLY INFLUENCED

    Generation of Artificial Training Data for Deep Learning

    VIEW 3 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    FILTER CITATIONS BY YEAR

    2018
    2020

    CITATION STATISTICS

    • 9 Highly Influenced Citations

    • Averaged 44 Citations per year from 2018 through 2019

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 18 REFERENCES

    Cyclical Learning Rates for Training Neural Networks

    • Leslie N. Smith
    • Computer Science
    • 2017 IEEE Winter Conference on Applications of Computer Vision (WACV)
    • 2015
    VIEW 6 EXCERPTS

    Practical Recommendations for Gradient-Based Training of Deep Architectures

    • Yoshua Bengio
    • Mathematics, Computer Science
    • Neural Networks: Tricks of the Trade
    • 2012
    VIEW 7 EXCERPTS
    HIGHLY INFLUENTIAL

    Super-convergence: very fast training of neural networks using large learning rates

    VIEW 2 EXCERPTS

    Regularization for Deep Learning: A Taxonomy

    VIEW 2 EXCERPTS

    Three Factors Influencing Minima in SGD

    VIEW 2 EXCERPTS