RGEM: A Responsive GPGPU Execution Model for Runtime Engines

@article{Kato2011RGEMAR,
  title={RGEM: A Responsive GPGPU Execution Model for Runtime Engines},
  author={Shinpei Kato and Karthik Lakshmanan and Aman Kumar and Mihir Kelkar and Yutaka Ishikawa and Ragunathan Raj Rajkumar},
  journal={2011 IEEE 32nd Real-Time Systems Symposium},
  year={2011},
  pages={57-66}
}
General-purpose computing on graphics processing units, also known as GPGPU, is a burgeoning technique to enhance the computation of parallel programs. Applying this technique to real-time applications, however, requires additional support for timeliness of execution. In particular, the non-preemptive nature of GPGPU, associated with copying data to/from the device memory and launching code onto the device, needs to be managed in a timely manner. In this paper, we present a responsive GPGPU… 
GPES: a preemptive execution system for GPGPU computing
TLDR
Experimental results demonstrate that GPES is able to reduce the pending time of high-priority tasks in a multitasking environment by up to 90% over the existing GPU driver solutions, while introducing small overheads.
Supporting Preemptive Task Executions and Memory Copies in GPGPUs
  • Can Basaran, K. Kang
  • Computer Science
    2012 24th Euromicro Conference on Real-Time Systems
  • 2012
TLDR
A new lightweight approach to supporting preemptive memory copies and job executions in GPGPUs is presented and it is shown that the response time of the approach is significantly shorter than those of the unmodified GPG PU runtime system that supports no preemption.
Enhancing manageability of execution and data for GPGPU computing
TLDR
Symphony focuses on abstracting scheduling control at the granularity of thread blocks of application kernels, and uses the software supervisor approach where a supervisor kernel cooperates with the hardware thread block scheduler to schedule application kernels on the SMs.
GLoop: an event-driven runtime for consolidating GPGPU applications
TLDR
GLoop is presented, which is a software runtime that enables us to consolidate GPGPU apps including GPU eaters including GLoop offers an event-driven programming model, which allows GLoop-based apps to inherit the GPU eater's high functionality while proportionally scheduling them on a shared GPU in an isolated manner.
Dynamic schedule management framework for aperiodic soft-real-time jobs on GPU based architectures
TLDR
A schedule management framework for aperiodic soft-real-time jobs that may be used by a CPU GPU system designer/integrator to select, conFigure and deploy a suitable architectural platform and to perform concurrent scheduling of these jobs.
PAPER Cooperative GPGPU Scheduling for Consolidating Server Workloads ∗
TLDR
GLoop is presented, which is a software runtime that enables us to consolidate GPGPU apps including GPU eaters including GLoop offers an event-driven programming model, which allows GLoop-based apps to inherit the GPU eater’s high functionality while proportionally scheduling them on a shared GPU in an isolated manner.
Device Hopping
TLDR
This work subdivides iteration spaces into slices, and considers migration on a slice-by-slice basis, and shows that slice sizes can be learned offline by machine learning models, and reduces the code size by at least 88% if compared to manual implementations of migratable kernels.
Dynamic Memory Bandwidth Allocation for Real-Time GPU-Based SoC Platforms
TLDR
This article proposes a novel memory bandwidth allocation scheme where it dynamically monitor the progress of a real-time application and increase the bandwidth share of BE ones whenever it is safe to do so, and demonstrates its effectiveness on a variety of GPU and CPU benchmarks.
GPUSync: A Framework for Real-Time GPU Management
TLDR
GPUSync is described, which is a framework for managing graphics processing units (GPUs) in multi-GPU multicore real-time systems and provides budget policing to the extent possible, given that GPU access is non-preemptive.
...
...

References

SHOWING 1-10 OF 30 REFERENCES
Enabling Task Parallelism in the CUDA Scheduler
TLDR
An issue queue that merges workloads that would underutilize GPU processing resources such that they can be run concurrently on an NVIDIA GPU is proposed and throughput is increased in all cases where the GPU would have been underused by a single kernel.
StoreGPU: exploiting graphics processing units to accelerate distributed storage systems
TLDR
StoreGPU is designed, a library that accelerates a number of hashing based primitives popular in distributed storage system implementations that enable up to eight-fold performance gains on synthetic benchmarks as well as on a high-level application: the online similarity detection between large data files.
Anytime Algorithms for GPU Architectures
TLDR
This investigation focuses on the development of time-bounded anytime algorithms on Graphics Processing Units (GPUs) to trade-off the quality of output with execution time, to enable imprecise and approximate real-time computation on parallel architectures for stream-based timebounded applications.
TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments
TLDR
TimeGraph is presented, a real-time GPU scheduler at the device-driver level for protecting important GPU workloads from performance interference and supports two priority-based scheduling policies in order to address the tradeoff between response times and throughput introduced by the asynchronous and non-preemptive nature of GPU processing.
A GPU accelerated storage system
TLDR
The design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing is presented and the results show that this technique can bring tangible performance gains without negatively impacting the performance of concurrently running applications.
GViM: GPU-accelerated virtual machines
TLDR
GViM is presented, a system designed for virtualizing and managing the resources of a general purpose system accelerated by graphics processors and how such accelerators can be virtualized without additional hardware support.
Operating Systems Challenges for GPU Resource Management
TLDR
The preliminary evaluation demonstrates the performance of open-source software is competitive wit h that of proprietary software, and hence operating systems research using GPU technology can start investigating GPU resource management.
FAST: fast architecture sensitive tree search on modern CPUs and GPUs
TLDR
FAST is an extremely fast architecture sensitive layout of the index tree logically organized to optimize for architecture features like page size, cache line size, and SIMD width of the underlying hardware, and achieves a 6X performance improvement over uncompressed index search for large keys on CPUs.
PTask: operating system abstractions to manage GPUs as compute devices
TLDR
It is shown that the PTask API can provide important system-wide guarantees where there were previously none, and can enable significant performance improvements, for example gaining a 5× improvement in maximum throughput for the gestural interface.
Dynamic load balancing on single- and multi-GPU systems
TLDR
Experimental results show that the proposed task-based dynamic load-balancing solution can utilize the hardware more efficiently than the CUDA scheduler for unbalanced workload, and achieves near-linear speedup, load balance, and significant performance improvement over techniques based on standard CUDA APIs.
...
...