Enabling preemptive multiprogramming on GPUs

  title={Enabling preemptive multiprogramming on GPUs},
  author={Ivan Tanasi{\'c} and Isaac Gelado and Javier Cabezas and Alex Ram{\'i}rez and Nacho Navarro and Mateo Valero},
  journal={2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)},
  • I. Tanasić, Isaac Gelado, M. Valero
  • Published 16 October 2014
  • Computer Science
  • 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)
GPUs are being increasingly adopted as compute accelerators in many domains, spanning environments from mobile systems to cloud computing. These systems are usually running multiple applications, from one or several users. However GPUs do not provide the support for resource sharing traditionally expected in these scenarios. Thus, such systems are unable to provide key multiprogrammed workload requirements, such as responsiveness, fairness or quality of service. In this paper, we propose a set… 

Figures and Tables from this paper

A software framework for efficient preemptive scheduling on GPU

The benefits of EffiSha are demonstrated by experimenting a set of preemptive scheduling policies, which show significantly enhanced support for fairness and priority-aware scheduling of GPU kernels.

Simultaneous Multikernel: Fine-Grained Sharing of GPUs

Simultaneous Multikernel (SMK) is proposed, a fine-grained dynamic sharing mechanism that fully utilizes resources within a streaming multiprocessor by exploiting heterogeneity of different kernels.

CASE: a compiler-assisted SchEduling framework for multi-GPU systems

The results show that, as compared to existing state-of-the-art methods, CASE improves throughput by up to 2.5X for Rodinia, and up to 1.7X for Darknet on modern NVIDIA GPU platforms, mainly due to the fact that it improves the average system utilization.

FlexSched: Efficient scheduling techniques for concurrent kernel execution on GPUs

This work presents a software scheduler, named FlexSched, that employs a run-time mechanism with low overhead to perform intra-SM cooperative thread arrays (a.k.a. thread block) allocation of co-executing kernels and implements a productive online profiling mechanism that allows dynamically changing kernels resource assignation attending to the instant performance achieved for co-running kernels.

MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency

MASK, a new GPU framework that provides low-overhead virtual memory support for the concurrent execution of multiple applications, is proposed and evaluations show that MASK restores much of the throughput lost to TLB contention.

Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing

Simultaneous Multikernel (SMK) is proposed, a fine-grain dynamic sharing mechanism, that fully utilizes resources within a streaming multiprocessor by exploiting heterogeneity of different kernels to improve system throughput while maintaining fairness.

Runtime Support for Adaptive Spatial Partitioning and Inter-Kernel Communication on GPUs

This paper describes a new scheduling mechanism for dynamic spatial partitioning of the GPU, which adapts to the current execution state of compute workloads on the device, and extends the OpenCL runtime environment to map multiple command queues to a single device, effectively partitioning the device.

Effective GPU Sharing Under Compiler Guidance

The proposed solution outperforms existing state-of-the-art solutions by leveraging its knowledge about applications’ multiple resource requirements, which include memory as well as SMs, and improves throughput by up to 2.5× for Rodinia benchmarks, and up to 1.7× for Darknet neural networks.

Cooperative kernels: GPU multitasking for blocking algorithms

This work describes a prototype implementation of a cooperative kernel framework implemented in OpenCL 2.0 and evaluates the approach by porting a set of blocking GPU applications to cooperative kernels and examining their performance under multitasking.

GPUShare: Fair-Sharing Middleware for GPU Clouds

GPUShare is presented, a software-based mechanism that can yield a kernel before all of its threads have run, thus giving finer control over the time slice for which the GPU is allocated to a process and improves fair GPU sharing across tenants.



Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

Kernelet embraces transparent memory management and PCI-e data transfer techniques, and dynamic slicing and scheduling techniques for kernel executions, and develops a novel Markov chain-based performance model to guide the scheduling decision.

Enabling Task Parallelism in the CUDA Scheduler

An issue queue that merges workloads that would underutilize GPU processing resources such that they can be run concurrently on an NVIDIA GPU is proposed and throughput is increased in all cases where the GPU would have been underused by a single kernel.

TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

TimeGraph is presented, a real-time GPU scheduler at the device-driver level for protecting important GPU workloads from performance interference and supports two priority-based scheduling policies in order to address the tradeoff between response times and throughput introduced by the asynchronous and non-preemptive nature of GPU processing.

Improving GPGPU concurrency with elastic kernels

This work studies concurrent execution of GPU kernels using multiprogram workloads on current NVIDIA Fermi GPUs, and proposes transformations that convert CUDA kernels into elastic kernels which permit fine-grained control over their resource usage.

Gdev: First-Class GPU Resource Management in the Operating System

Gdev is presented, a new ecosystem of GPU resource management in the operating system (OS) that allows the user space as well as the OS itself to use GPUs as first-class computing resources.

Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework

A framework to enable applications executing within virtual machines to transparently share one or more GPUs is presented and it is found that even when contention is high the consolidation algorithm is effective in improving the throughput, and that the runtime overhead of the framework is low.

GPU Resource Sharing and Virtualization on High Performance Computing Systems

A GPU resource virtualization approach to allow underutilized microprocessors to share the GPUs and demonstrate a considerable performance gain over the traditional SPMD execution without virtualization.

The case for GPGPU spatial multitasking

The case is made for a GPU multitasking technique called spatial multitasking, which allows GPU resources to be partitioned among multiple applications simultaneously and shows an average speedup of up to 1.19 over cooperative multitasking when two applications are sharing the GPU.

Fine-grained resource sharing for concurrent GPGPU kernels

KernelMerge provides a concurrent kernel scheduler compatible with the OpenCL API that runs two OpenCL kernels concurrently on one device and outlines a method for using KernelMerge to investigate how concurrent kernels influence each other, with the goal of predicting runtimes for concurrent execution from individual kernel runtimes.

Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces

This work is the first to explore GPU Memory Management Units (MMUs) consisting of Translation Lookaside Buffers (TLBs) and page table walkers (PTWs) for address translation in unified heterogeneous systems and shows that a little TLB-awareness can make other GPU performance enhancements feasible in the face of cache-parallel address translation.