Skip to search formSkip to main content
You are currently offline. Some features of the site may not work correctly.

Stochastic gradient descent

Known as: Gradient descent in machine learning, SGD (disambiguation), AdaGrad 
Stochastic gradient descent (often shortened in SGD), also known as incremental gradient descent, is a stochastic approximation of the gradient… Expand
Wikipedia

Papers overview

Semantic Scholar uses AI to extract papers important to this topic.
Review
2019
Review
2019
With the breakthroughs in deep learning, the recent years have witnessed a booming of artificial intelligence (AI) applications… Expand
  • figure 1
  • figure 2
  • figure 3
  • figure 4
  • figure 5
Review
2019
Review
2019
Much of recent machine learning has focused on deep learning, in which neural network weights are trained through variants of… Expand
  • figure 1
  • figure 2
Review
2018
Review
2018
In the era of the Internet of Things (IoT), an enormous amount of sensing devices collect and/or generate various sensory data… Expand
  • figure 1
  • figure 2
  • figure 3
  • figure 4
  • table I
Highly Cited
2017
Highly Cited
2017
Restart techniques are common in gradient-free optimization to deal with multimodal functions. Partial warm restarts are also… Expand
  • figure 1
  • figure 2
  • table 1
  • figure 3
  • table 2
Highly Cited
2013
Highly Cited
2013
Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent… Expand
  • figure 1
  • figure 2
  • figure 3
  • figure 4
Highly Cited
2011
Highly Cited
2011
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine… Expand
  • figure 2
  • figure 4
  • figure 3
  • figure 5
Highly Cited
2011
Highly Cited
2011
We provide a novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions of… Expand
Highly Cited
2010
Highly Cited
2010
During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of… Expand
Highly Cited
2010
Highly Cited
2010
With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we… Expand
  • figure 1
  • figure 2
  • figure 3
Highly Cited
2004
Highly Cited
2004
Linear prediction methods, such as least squares for regression, logistic regression and support vector machines for… Expand
  • figure 1
  • figure 2