PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units

@article{Choi2020PREMAAP,
  title={PREMA: A Predictive Multi-Task Scheduling Algorithm For Preemptible Neural Processing Units},
  author={Yujeong Choi and Minsoo Rhu},
  journal={2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)},
  year={2020},
  pages={220-233}
}
  • Yujeong Choi, Minsoo Rhu
  • Published 2020
  • Computer Science
  • 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)
To amortize cost, cloud vendors providing DNN acceleration as a service to end-users employ consolidation and virtualization to share the underlying resources among multiple DNN service requests. This paper makes a case for a "preemptible" neural processing unit (NPU) and a "predictive" multi-task scheduler to meet the latency demands of high-priority inference while maintaining high throughput. We evaluate both the mechanisms that enable NPUs to be preemptible and the policies that utilize… Expand
18 Citations
Layerweaver: Maximizing Resource Utilization of Neural Processing Units via Layer-Wise Scheduling
  • Young H. Oh, Seonghak Kim, +7 authors Jae W. Lee
  • Computer Science
  • 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)
  • 2021
  • 1
Deadline-Aware Offloading for High-Throughput Accelerators
  • Highly Influenced
Lazy Batching: An SLA-aware Batching System for Cloud Machine Learning Inference
  • 1
  • PDF
DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference
  • 33
  • PDF
A Reconfigurable Multithreaded Accelerator for Recurrent Neural Networks
Domain-specific Genetic Algorithm for Multi-tenant DNNAccelerator Scheduling
  • Highly Influenced
  • PDF
A Multi-Neural Network Acceleration Architecture
  • 9
  • PDF
Cross-Stack Workload Characterization of Deep Recommendation Systems
  • 2
  • PDF
Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks
  • 4
  • Highly Influenced
  • PDF
...
1
2
...

References

SHOWING 1-10 OF 99 REFERENCES
Enabling preemptive multiprogramming on GPUs
  • 154
  • PDF
Chimera: Collaborative Preemption for Multitasking on a Shared GPU
  • 125
  • PDF
Enabling Efficient Preemption for SIMT Architectures with Lightweight Context Switching
  • Z. Lin, L. Nyland, Huiyang Zhou
  • Computer Science
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2016
  • 26
  • PDF
Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks
  • 91
  • PDF
In-datacenter performance analysis of a tensor processing unit
  • N. Jouppi, C. Young, +73 authors D. Yoon
  • Computer Science
  • 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)
  • 2017
  • 2,187
  • Highly Influential
  • PDF
Improving GPGPU concurrency with elastic kernels
  • 177
  • PDF
A Case for Memory-Centric HPC System Architecture for Training Deep Neural Networks
  • 10
...
1
2
3
4
5
...