Corpus ID: 211171485

Is Local SGD Better than Minibatch SGD?

@article{Woodworth2020IsLS,
  title={Is Local SGD Better than Minibatch SGD?},
  author={Blake E. Woodworth and Kumar Kshitij Patel and Sebastian U. Stich and Zhen Dai and Brian Bullins and H. Brendan McMahan and Ohad Shamir and Nathan Srebro},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.07839}
}
  • Blake E. Woodworth, Kumar Kshitij Patel, +5 authors Nathan Srebro
  • Published in ArXiv 2020
  • Mathematics, Computer Science
  • We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD and that accelerated local SGD is minimax optimal for quadratics; (2) For general convex… CONTINUE READING

    Figures, Tables, and Topics from this paper.

    Explore Further: Topics Discussed in This Paper

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 36 REFERENCES

    Local SGD Converges Fast and Communicates Little

    VIEW 14 EXCERPTS

    Better Communication Complexity for Local SGD

    VIEW 11 EXCERPTS
    HIGHLY INFLUENTIAL

    Problem Complexity and Method Efficiency in Optimization

    VIEW 6 EXCERPTS
    HIGHLY INFLUENTIAL

    Optimal Distributed Online Prediction Using Mini-Batches

    VIEW 4 EXCERPTS