Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision

  title={Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision},
  author={Wei Gao and Qi Hu and Zhisheng Ye and Peng Sun and Xiaolin Wang and Yingwei Luo and Tianwei Zhang and Yonggang Wen},
Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU accelerators have been collectively constructed into a GPU datacenter. An efficient scheduler design for such GPU datacenter is crucially important to reduce the operational cost and improve resource utilization. However, traditional approaches designed for big data or high performance computing workloads can not support DL… 

Figures and Tables from this paper

Conflict-Receptive and Prognosis Scheduling in Deep Learning Systems
In Manufacturing Industry, the main process involved are product design, prototype, product testing and then finally mass production is instantiated. So, most of the manufacturing industries and


Characterization and prediction of deep learning workloads in large-scale GPU datacenters
This work performs a large-scale analysis of real-world job traces from SenseTime, and introduces a general-purpose framework, which manages resources based on historical data, about the characteristics of DL jobs and resource management.
Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem
MIGserving is an algorithm pipeline that blends a variety of newly designed algorithms and customized classic algorithms, including a heuristic greedy algorithm, Genetic Algorithm (GA), and Monte Carlo Tree Search algorithm (MCTS), and is implemented on Kubernetes.
INFaaS: Automated Model-less Inference Serving
INFaaS is introduced, an automated model-less system for distributed inference serving, where developers simply specify the performance and accuracy requirements for their applications without needing to specify a specific model-variant for each query.
Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in Public Cloud
A cost effective ensembling-based model serving framework that employs a distributed proactive autoscaling policy combined with importance sampling, to efficiently allocate resources for the models and reduces the number of models in the ensemble while satisfying the accuracy and latency requirements.
RubberBand: cloud-based hyperparameter tuning
RubberBand is presented---the first framework for cost-efficient, elastic execution of hyperparameter tuning jobs in the cloud, and it is shown that the available parallelism in such jobs changes dynamically over the course of execution and, therefore, presents an opportunity to leverage the elasticity of the cloud.
GSLICE: controlled spatial sharing of GPUs for a scalable inference platform
GSLICE virtualizes the GPU by apportioning the GPU resources across different Inference Functions (IFs), thus providing isolation and guaranteeing performance and develops self-learning and adaptive GPU resource allocation and batching schemes that account for network traffic characteristics, while also keeping inference latencies below service level objectives.
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning
Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors both at the per-job level and at the cluster-wide level, and can reduce the cost of training large models in cloud environments by 25%.
Irina: Accelerating DNN Inference with Efficient Online Scheduling
This work proposes the preliminary design of the first online inference task scheduling system, called Irina, that takes completion time under unpredictable workload as its primary objective and can improve average task completion time over TensorFlow Serving scheduling.
JPAS: Job-progress-aware flow scheduling for deep learning clusters
HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline
A dynamic application-level resource scheduler to track, identify, and preferentially allocate resources to the best performing trials to maximize accuracy by the deadline, HyperSched leverages three properties of a hyperparameter search workload overlooked in prior work -- trial disposability, progressively identifiable rankings among different configurations, and space-time constraints.