I Nefficiency of Stochastic Gradient Descent with Larger Mini-batches ( and More Learners )

Abstract

Stochastic Gradient Descent (SGD) and its variants are the most important optimization algorithms used in large scale machine learning. Mini-batch version of stochastic gradient is often used in practice for taking advantage of hardware parallelism. In this work, we analyze the effect of mini-batch size over SGD convergence for the case of general non… (More)

2 Figures and Tables

Topics

  • Presentations referencing similar topics