Training Neural Networks Without Gradients: A Scalable ADMM Approach

  title={Training Neural Networks Without Gradients: A Scalable ADMM Approach},
  author={Gavin Taylor and Ryan Burmeister and Zheng Xu and Bharat Singh and Ankit Patel and Tom Goldstein},
With the growing importance of large network models and enormous training datasets, GPUs have become increasingly necessary to train neural networks. This is largely because conventional optimization algorithms rely on stochastic gradient methods that don’t scale well to large numbers of cores in a cluster setting. Furthermore, the convergence of all gradient methods, including batch methods, suffers from common problems like saturation effects, poor conditioning, and saddle points. This paper… CONTINUE READING
Highly Influential
This paper has highly influenced 11 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 70 citations. REVIEW CITATIONS
Recent Discussions
This paper has been referenced on Twitter 67 times over the past 90 days. VIEW TWEETS


Publications citing this paper.
Showing 1-10 of 52 extracted citations

71 Citations

Citations per Year
Semantic Scholar estimates that this publication has 71 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 26 references

On the importance of initialization and momentum in deep learning

  • Sutskever, Ilya, +5 authors Geoffrey
  • In Proceedings of the 30th international…
  • 2013
1 Excerpt

Similar Papers

Loading similar papers…