Corpus ID: 210714229

Gradient descent with momentum - to accelerate or to super-accelerate?

@article{Nakerst2020GradientDW,
  title={Gradient descent with momentum - to accelerate or to super-accelerate?},
  author={Goran Nakerst and John D Brennan and Masudul Haque},
  journal={ArXiv},
  year={2020},
  volume={abs/2001.06472}
}
  • Goran Nakerst, John D Brennan, Masudul Haque
  • Published in ArXiv 2020
  • Mathematics, Computer Science
  • We consider gradient descent with `momentum', a widely used method for loss function minimization in machine learning. This method is often used with `Nesterov acceleration', meaning that the gradient is evaluated not at the current position in parameter space, but at the estimated position after one step. In this work, we show that the algorithm can be improved by extending this `acceleration' --- by using the gradient at an estimated position several steps ahead rather than just one step… CONTINUE READING

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 45 REFERENCES

    Introduction to numerical analysis

    VIEW 13 EXCERPTS
    HIGHLY INFLUENTIAL

    Decaying momentum helps neural network training

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    An overview of gradient descent optimization algorithms

    VIEW 17 EXCERPTS
    HIGHLY INFLUENTIAL

    Neural Networks and Deep Learning

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    Incorporating Nesterov Momentum into Adam

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    Adam: A Method for Stochastic Optimization

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    ADADELTA: An Adaptive Learning Rate Method

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    A Survey of Optimization Methods from a Machine Learning Perspective

    VIEW 1 EXCERPT

    A high-bias, low-variance introduction to Machine Learning for physicists

    VIEW 1 EXCERPT