On the momentum term in gradient descent learning algorithms

@article{Qian1999OnTM,
  title={On the momentum term in gradient descent learning algorithms},
  author={N. Qian},
  journal={Neural networks : the official journal of the International Neural Network Society},
  year={1999},
  volume={12 1},
  pages={
          145-151
        }
}
  • N. Qian
  • Published 1999
  • Computer Science, Mathematics, Medicine
  • Neural networks : the official journal of the International Neural Network Society
  • A momentum term is usually included in the simulations of connectionist learning algorithms. Although it is well known that such a term greatly improves the speed of learning, there have been few rigorous studies of its mechanisms. In this paper, I show that in the limit of continuous time, the momentum parameter is analogous to the mass of Newtonian particles that move through a viscous medium in a conservative force field. The behavior of the system near a local minimum is equivalent to a set… CONTINUE READING
    Steepest descent with momentum for quadratic functions is a version of the conjugate gradient method
    • 67
    • Highly Influenced
    • Open Access
    On the influence of momentum acceleration on online learning
    • 20
    • Highly Influenced
    • Open Access
    Adam: A Method for Stochastic Optimization
    • 49,111
    • Open Access
    Convergence of Cyclic and Almost-Cyclic Learning With Momentum for Feedforward Neural Networks
    • 25
    • Open Access
    Analysis Of Momentum Methods
    • 2
    • Open Access
    Near optimal step size and momentum in gradient descent for quadratic functions
    • 2
    • Highly Influenced
    • Open Access
    Momentum Accelerates Evolutionary Dynamics
    Laplacian Smoothing Gradient Descent
    • 17
    • Open Access

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 15 REFERENCES
    Increased rates of convergence through learning rate adaptation
    • 1,877
    Learning internal representations
    • 267
    • Highly Influential
    • Open Access
    Optimal Brain Damage
    • 2,686
    • Open Access
    The Computational Brain
    • 1,576
    • Open Access
    Learning to Solve Random-Dot Stereograms of Dense and Transparent Surfaces with Recurrent Backpropagation
    • 30
    • Highly Influential
    • Open Access
    Learning internal representations by error propagation
    • 17,697
    • Highly Influential
    • Open Access
    Parallel distributed processing (Vol
    • 1986