• Corpus ID: 19981582

ChainerMN: Scalable Distributed Deep Learning Framework

@article{Akiba2017ChainerMNSD,
  title={ChainerMN: Scalable Distributed Deep Learning Framework},
  author={Takuya Akiba and Keisuke Fukuda and Shuji Suzuki},
  journal={ArXiv},
  year={2017},
  volume={abs/1710.11351}
}
One of the keys for deep learning to have made a breakthrough in various fields was to utilize high computing powers centering around GPUs. Enabling the use of further computing abilities by distributed processing is essential not only to make the deep learning bigger and faster but also to tackle unsolved challenges. We present the design, implementation, and evaluation of ChainerMN, the distributed deep learning framework we have developed. We demonstrate that ChainerMN can scale the learning… 

Figures and Tables from this paper

Usability Study of Distributed Deep Learning Frameworks For Convolutional Neural Networks
TLDR
This paper surveys the various distributed versions of popular deep learning frameworks and provides a qualitative comparison that measures community popularity, functionality, compatibility, and ease-of-use to allow practitioners to make an informed choice on which framework to use when conducting distributed training of deep learning models.
Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools
TLDR
This survey performs a broad and thorough investigation on challenges, techniques and tools for scalable DL on distributed infrastructures, and highlights future research trends in DL systems that deserve further research.
BigDL: A Distributed Deep Learning Framework for Big Data
TLDR
This paper presents BigDL (adistributeddeeplearning framework for Apache Spark), which allows deep learning applications to run on the Apache Hadoop/Spark cluster so as to directly process the production data, and as a part of the end-to-end data analysis pipeline for deployment and management.
swFLOW: A Dataflow Deep Learning Framework on Sunway TaihuLight Supercomputer
  • Han LinZeng LinJ. M. DiazMingfan LiHong AnG. Gao
  • Computer Science
    2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
  • 2019
Deep learning technology is widely used in many modern fields and a number of deep learning models and software frameworks have been proposed. However, it is still very difficult to process deep
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks
TLDR
This work analyzes the compute, communication, and memory requirements of Convolutional Neural Networks (CNNs) to understand the trade-offs between different parallelism approaches on performance and scalability and concludes that the oracle has an average accuracy of about 86.74% when compared to empirical results, and as high as 97.57% for data parallelism.
Exascale Deep Learning for Climate Analytics
TLDR
Improvements to the software frameworks, input pipeline, and the network training algorithms necessary to efficiently scale deep learning on the Piz Daint and Summit systems are described.
A Survey of Scalable Deep Learning Frameworks
TLDR
This research aims to provide an overview of the relevant, widely used scalable machine learning and deep learning frameworks currently available and to provide the grounds on which researchers can compare and choose the best set of tools for their ML pipeline.
Distributed deep learning system for cancerous region detection on Sunway TaihuLight
TLDR
To explore the potential of distributed training on deep neural networks, several distributed algorithms with the basis of swFlow on the world-leading supercomputer, Sunway TaihuLight are implemented and the great opportunity for joint combination of deep learning and HPC system is revealed.
EZLDA: Efficient and Scalable LDA on GPUs
TLDR
EZLDA is introduced which achieves efficient and scalable LDA training on GPUs with the following three contributions: three-branch sampling method which takes advantage of the convergence heterogeneity of various tokens to reduce the redundant sampling task, and a hierarchical workload balancing solution to address the extremely skewed workload imbalance problem.
...
...

References

SHOWING 1-8 OF 8 REFERENCES
Chainer : a Next-Generation Open Source Framework for Deep Learning
TLDR
Chainer provides a flexible, intuitive, and high performance means of implementing a full range of deep learning models, including state-of-the-art models such as recurrent neural networks and variational autoencoders.
TensorFlow: A system for large-scale machine learning
TLDR
The TensorFlow dataflow model is described and the compelling performance that Tensor Flow achieves for several real-world applications is demonstrated.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
TLDR
This paper empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization and enable training visual recognition models on internet-scale data with high efficiency.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes
TLDR
It is demonstrated that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs with several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule.
The mnist database of handwritten digits
TLDR
An improved articulated bar flail having shearing edges for efficiently shredding materials and an improved shredder cylinder with a plurality of these flails circumferentially spaced and pivotally attached to the periphery of a rotatable shaft are disclosed.
Revisiting Distributed Synchronous SGD
TLDR
It is demonstrated that a third approach, synchronous optimization with backup workers, can avoid asynchronous noise while mitigating for the worst stragglers and is empirically validated and shown to converge faster and to better test accuracies.
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
TLDR
The API design and the system implementation of MXNet are described, and it is explained how embedding of both symbolic expression and tensor operation is handled in a unified fashion.