# Gradient descent with momentum - to accelerate or to super-accelerate?

@article{Nakerst2020GradientDW, title={Gradient descent with momentum - to accelerate or to super-accelerate?}, author={Goran Nakerst and John D Brennan and Masudul Haque}, journal={ArXiv}, year={2020}, volume={abs/2001.06472} }

We consider gradient descent with `momentum', a widely used method for loss function minimization in machine learning. This method is often used with `Nesterov acceleration', meaning that the gradient is evaluated not at the current position in parameter space, but at the estimated position after one step. In this work, we show that the algorithm can be improved by extending this `acceleration' --- by using the gradient at an estimated position several steps ahead rather than just one step… CONTINUE READING

Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

#### References

##### Publications referenced by this paper.

SHOWING 1-10 OF 45 REFERENCES

## Introduction to numerical analysis

VIEW 13 EXCERPTS

HIGHLY INFLUENTIAL

## Decaying momentum helps neural network training

VIEW 4 EXCERPTS

HIGHLY INFLUENTIAL

## ADINE: an adaptive momentum method for stochastic gradient descent

VIEW 4 EXCERPTS

HIGHLY INFLUENTIAL

## An overview of gradient descent optimization algorithms

VIEW 17 EXCERPTS

HIGHLY INFLUENTIAL

## Neural Networks and Deep Learning

VIEW 4 EXCERPTS

HIGHLY INFLUENTIAL

## Incorporating Nesterov Momentum into Adam

VIEW 4 EXCERPTS

HIGHLY INFLUENTIAL

## Adam: A Method for Stochastic Optimization

VIEW 4 EXCERPTS

HIGHLY INFLUENTIAL

## ADADELTA: An Adaptive Learning Rate Method

VIEW 4 EXCERPTS

HIGHLY INFLUENTIAL