Large-batch training for LSTM and beyond

@article{You2019LargebatchTF,
  title={Large-batch training for LSTM and beyond},
  author={Yang You and Jonathan Hseu and Chris Ying and J. Demmel and K. Keutzer and Cho-Jui Hsieh},
  journal={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
  year={2019}
}
  • Yang You, Jonathan Hseu, +3 authors Cho-Jui Hsieh
  • Published 2019
  • Computer Science, Mathematics
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
  • Large-batch training approaches have enabled researchers to utilize distributed processing and greatly accelerate deep neural networks training. However, there are three problems in current large-batch research: (1) Although RNN approaches like LSTM have been widely used in many applications, current large-batch research is principally focused on CNNs. (2) Even for CNNs, there is no automated technique for extending the batch size beyond 8K. (3) To keep the variance in the gradient expectation… CONTINUE READING

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 19 CITATIONS

    Large Batch Training Does Not Need Warmup

    VIEW 1 EXCERPT
    CITES METHODS

    Training Google Neural Machine Translation on an Intel CPU Cluster

    High-Performance Deep Learning via a Single Building Block

    VIEW 1 EXCERPT
    CITES METHODS

    Distributed Learning of Neural Networks using Independent Subnet Training

    • Binhang Yuan Anastasios Kyrillidis Christopher M. Jermaine
    • 2020
    VIEW 2 EXCERPTS
    CITES BACKGROUND & METHODS

    Theoretical Scalability Analysis of Distributed Deep Convolutional Neural Networks

    VIEW 1 EXCERPT
    CITES BACKGROUND

    Demystifying Parallel and Distributed Deep Learning

    VIEW 1 EXCERPT
    CITES METHODS

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 12 REFERENCES

    Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

    VIEW 16 EXCERPTS
    HIGHLY INFLUENTIAL

    Deep Residual Learning for Image Recognition

    VIEW 11 EXCERPTS
    HIGHLY INFLUENTIAL

    Long Short-Term Memory

    VIEW 2 EXCERPTS
    HIGHLY INFLUENTIAL

    One weird trick for parallelizing convolutional neural networks

    VIEW 14 EXCERPTS
    HIGHLY INFLUENTIAL

    A Stochastic Approximation Method

    VIEW 6 EXCERPTS
    HIGHLY INFLUENTIAL

    Accurate

    • large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677
    • 2017
    VIEW 17 EXCERPTS
    HIGHLY INFLUENTIAL

    Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning

    • 2012
    VIEW 5 EXCERPTS
    HIGHLY INFLUENTIAL