# On the momentum term in gradient descent learning algorithms

@article{Qian1999OnTM, title={On the momentum term in gradient descent learning algorithms}, author={Ning Qian}, journal={Neural networks : the official journal of the International Neural Network Society}, year={1999}, volume={12 1}, pages={ 145-151 } }

## 1,742 Citations

### On the influence of momentum acceleration on online learning

- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016

The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value, and suggests a method to enhance performance in the Stochastic setting by tuning the momentum parameter over time.

### Continuous Time Analysis of Momentum Methods

- Computer ScienceJ. Mach. Learn. Res.
- 2021

This work focuses on understanding the role of momentum in the training of neural networks, concentrating on the common situation in which the momentum contribution is fixed at each step of the algorithm, and proves three continuous time approximations of discrete algorithms of the discrete algorithms.

### A Global Minimization Algorithm Based on a Geodesic of a Lagrangian Formulation of Newtonian Dynamics

- MathematicsNeural Processing Letters
- 2007

A novel adaptive steepest descent is obtained by applying the first-order update rule to the Rosenbrock- and Griewank-type potentials and determining the global minimum in most cases from various initial points.

### Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method

- PhysicsNeural Networks
- 2004

### Momentum Accelerates Evolutionary Dynamics

- Computer ScienceArXiv
- 2020

This work combines momentum from machine learning with evolutionary dynamics, using information divergences as Lyapunov functions to show that momentum accelerates the convergence of evolutionary dynamics including the replicator equation and Euclidean gradient descent on populations.

### Analysis Of Momentum Methods

- Computer Science, PhysicsArXiv
- 2019

This work shows that, contrary to popular belief, standard implementations of fixed momentum methods do no more than act to rescale the learning rate, and shows that the momentum method converges to a gradient flow, with a momentum-dependent time-rescaling, using the method of modified equations from numerical analysis.

### Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks

- Computer ScienceSpringerPlus
- 2016

Compared with existed algorithms, the novel algorithm can get more sparse network structure, namely it forces weights to become smaller during the training and can eventually removed after the training, which means that it can simply the network structure and lower operation time.

### Convergence of Momentum-Based Stochastic Gradient Descent

- Computer Science2020 IEEE 16th International Conference on Control & Automation (ICCA)
- 2020

It is proved that the m SGD algorithm is almost surely convergent at each trajectory, and the convergence rate of mSGD is analyzed.

### Just a Momentum: Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problem

- Physics, Computer ScienceArXiv
- 2021

This work uses dynamical mean field theory techniques to describe analytically the average behaviour of several algorithms including heavy-ball momentum and Nesterov acceleration in a prototypical non-convex model: the (spiked) matrix-tensor model.

### Just a Momentum : Analytical Study of Momentum-Based Acceleration Methods Methods in Paradigmatic High-Dimensional Non-Convex Problems

- Physics, Computer Science
- 2021

This work uses dynamical mean field theory techniques to describe analytically the average behaviour of several algorithms including heavy-ball momentum and Nesterov acceleration in a prototypical non-convex model: the (spiked) matrix-tensor model.

## References

SHOWING 1-9 OF 9 REFERENCES

### Increased rates of convergence through learning rate adaptation

- Computer ScienceNeural Networks
- 1988

### Learning internal representations

- Computer ScienceCOLT '95
- 1995

It is proved that the number of examples required to ensure good generalisation from a representation learner obeys and that gradient descent can be used to train neural network representations and experiment results are reported providing strong qualitative support for the theoretical results.

### Learning to Solve Random-Dot Stereograms of Dense and Transparent Surfaces with Recurrent Backpropagation

- Computer Science
- 1989

The recurrent backpropagation learning algorithm of Pineda (1987) is used to construct network models with lateral and feedback connections that can solve the correspondence problem for random-dot stereograms.

### Optimal Brain Damage

- Computer ScienceNIPS
- 1989

A class of practical and nearly optimal schemes for adapting the size of a neural network by using second-derivative information to make a tradeoff between network complexity and training set error is derived.

### Predicting the secondary structure of globular proteins using neural network models.

- BiologyJournal of molecular biology
- 1988

### Parallel distributed processing (Vol

- 1986