Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization


Asynchronous parallel implementations of stochastic gradient (SG) have been broadly used in solving deep neural network and received many successes in practice recently. However, existing theories cannot explain their convergence and speedup properties, mainly due to the nonconvexity of most deep learning formulations and the asynchronous parallel mechanism… (More)



Citations per Year

95 Citations

Semantic Scholar estimates that this publication has 95 citations based on the available data.

See our FAQ for additional information.