• Corpus ID: 41813361

Flexible Primitives for Distributed Deep Learning in Ray

  title={Flexible Primitives for Distributed Deep Learning in Ray},
  author={Yaroslav Bulatov and DIUx Robert Nishihara},
Distributed computation is increasingly important for deep learning, and many deep learning frameworks provide built-in support for distributed training. This results in a tight coupling between the neural network computation and the underlying distributed execution, which poses a challenge for the implementation of new communication and aggregation strategies. We argue that decoupling the deep learning framework from the distributed execution framework enables the flexible development of new… 
1 Citations

Figures from this paper

Democratizing Production-Scale Distributed Deep Learning
An internal service built at Apple from the ground up for distributed training for easy, fast, and Scalable distributed training, and case studies of its internal adoption in the development of autonomous systems.


Large Scale Distributed Deep Networks
This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
The API design and the system implementation of MXNet are described, and it is explained how embedding of both symbolic expression and tensor operation is handled in a unified fashion.
Ray: A Distributed Framework for Emerging AI Applications
This paper proposes an architecture that logically centralizes the system's control state using a sharded storage system and a novel bottom-up distributed scheduler that speeds up challenging benchmarks and serves as both a natural and performant fit for an emerging class of reinforcement learning applications and algorithms.
Real-Time Machine Learning: The Missing Pieces
It is asserted that a new distributed execution framework is needed for such ML applications and a candidate approach with a proof-of-concept architecture that achieves a 63x performance improvement over a state- of-the-art execution framework for a representative application is proposed.
Revisiting Distributed Synchronous SGD
It is demonstrated that a third approach, synchronous optimization with backup workers, can avoid asynchronous noise while mitigating for the worst stragglers and is empirically validated and shown to converge faster and to better test accuracies.
TensorFlow: A system for large-scale machine learning
The TensorFlow dataflow model is described and the compelling performance that Tensor Flow achieves for several real-world applications is demonstrated.
Scaling Distributed Machine Learning with the Parameter Server
View on new challenges identified are shared, and some of the application scenarios such as micro-blog data analysis and data processing in building next generation search engines are covered.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
This paper empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization and enable training visual recognition models on internet-scale data with high efficiency.
More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server
We propose a parameter server system for distributed ML, which follows a Stale Synchronous Parallel (SSP) model of computation that maximizes the time computational workers spend doing useful work on
Probabilistically Bounded Staleness for Practical Partial Quorums
This work explains why partial quorums are regularly acceptable in practice, analyzing both the staleness of data they return and the latency benefits they offer, and introduces Probabilistically Bounded Staleness (PBS) consistency, which provides expected bounds on staleness with respect to both versions and wall clock time.