Gradient descent is a first-order iterative optimization algorithm. To find a local minimum of a function using gradient descent, one takes stepsâ€¦Â (More)

Semantic Scholar uses AI to extract papers important to this topic.

Highly Cited

2013

Highly Cited

2013

- Rie Johnson, Tong Zhang
- NIPS
- 2013

Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherentâ€¦Â (More)

Is this relevant?

Highly Cited

2012

Highly Cited

2012

- LÃ©on Bottou
- Neural Networks: Tricks of the Trade
- 2012

Chapter 1 strongly advocates the stochastic back-propagation method to train neural networks. This is in fact an instance of aâ€¦Â (More)

Is this relevant?

Highly Cited

2011

Highly Cited

2011

Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve stateof-the-art performance on a variety of machineâ€¦Â (More)

Is this relevant?

Highly Cited

2011

Highly Cited

2011

We provide a novel algorithm to approximately factor large matrices with millions of rows, millions of columns, and billions ofâ€¦Â (More)

Is this relevant?

Highly Cited

2010

Highly Cited

2010

- Martin Zinkevich, Markus Weimer, Alexander J. Smola, Lihong Li
- NIPS
- 2010

<lb>With the increase in available data parallel machine learning has become an in-<lb>creasingly pressing problem. In this paperâ€¦Â (More)

Is this relevant?

Highly Cited

2007

Highly Cited

2007

- Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, Andrew Cotter
- Math. Program.
- 2007

We describe and analyze a simple and effective iterative algorithm for solving the optimization problem cast by Support Vectorâ€¦Â (More)

Is this relevant?

Highly Cited

2005

Highly Cited

2005

- Christopher J. C. Burges, Tal Shaked, +4 authors Gregory N. Hullender
- ICML
- 2005

We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost functionâ€¦Â (More)

Is this relevant?

Highly Cited

1999

Highly Cited

1999

Much recent attention, both experimental and theoretical, has been focussed on classication algorithms which produce votedâ€¦Â (More)

Is this relevant?

Highly Cited

1997

Highly Cited

1997

- Jyrki Kivinen, Manfred K. Warmuth
- Inf. Comput.
- 1997

We consider two algorithms for on-line prediction based on a linear model. The algorithms are the well-known gradient descent (GDâ€¦Â (More)

Is this relevant?

Highly Cited

1994

Highly Cited

1994

- Yoshua Bengio, Patrice Y. Simard, Paolo Frasconi
- IEEE Trans. Neural Networks
- 1994

Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production orâ€¦Â (More)

Is this relevant?