Skip to search formSkip to main contentSkip to account menu
You are currently offline. Some features of the site may not work correctly.

Stochastic gradient descent

Known as: Gradient descent in machine learning, SGD (disambiguation), AdaGrad 
Stochastic gradient descent (often shortened in SGD), also known as incremental gradient descent, is a stochastic approximation of the gradient… 
Wikipedia

Papers overview

Semantic Scholar uses AI to extract papers important to this topic.
Highly Cited
2017
Highly Cited
2017
Restart techniques are common in gradient-free optimization to deal with multimodal functions. Partial warm restarts are also… 
  • figure 1
  • figure 2
  • table 1
  • figure 3
  • table 2
Highly Cited
2017
Highly Cited
2017
Most distributed machine learning systems nowadays, including TensorFlow and CNTK, are built in a centralized fashion. One… 
  • figure 1
  • figure 2
Highly Cited
2016
Highly Cited
2016
We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization… 
  • figure 1
  • figure 2
  • figure 3
  • figure 4
  • figure 5
Highly Cited
2013
Highly Cited
2013
Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent… 
  • figure 1
  • figure 2
  • figure 3
  • figure 4
Highly Cited
2012
Highly Cited
2012
  • L. Bottou
  • Neural Networks: Tricks of the Trade
  • 2012
  • Corpus ID: 121049
Chapter 1 strongly advocates the stochastic back-propagation method to train neural networks. This is in fact an instance of a… 
  • table 1
  • table 2
  • figure 1
  • figure 2
  • figure 3
Highly Cited
2011
Highly Cited
2011
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine… 
  • figure 2
  • figure 4
  • figure 3
  • figure 5
Highly Cited
2011
Highly Cited
2011
We provide a novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions of… 
Highly Cited
2010
Highly Cited
2010
During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of… 
Highly Cited
2010
Highly Cited
2010
With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we… 
  • figure 1
  • figure 2
  • figure 3
Highly Cited
2004
Highly Cited
2004
Linear prediction methods, such as least squares for regression, logistic regression and support vector machines for… 
  • figure 1
  • figure 2