Corpus ID: 211532459

Improvement of Automatic GPU Offloading Technology for Application Loop Statements

@article{Yamato2020ImprovementOA,
  title={Improvement of Automatic GPU Offloading Technology for Application Loop Statements},
  author={Yoji Yamato},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.12115}
}
  • Y. Yamato
  • Published 27 February 2020
  • Computer Science
  • ArXiv
In recent years, with the slowing down of Moore's law, utilization of hardware other than CPU such as GPU or FPGA is increasing. However, when using heterogeneous hardware other than CPUs, barriers of technical skills such as CUDA and HDL are high. Based on that, I have proposed environment adaptive software that enables automatic conversion, configuration, and high-performance operation of once written code, according to the hardware to be placed. Partly of the offloading to the GPU and FPGA… Expand

References

SHOWING 1-10 OF 40 REFERENCES
OpenACC - First Experiences with Real-World Applications
TLDR
This work presents the first experiences with OpenACC, an API consisting of compiler directives to offload loops and regions of C/C++ and Fortran code to accelerators and finds that OpenACC offers a promising ratio of development effort to performance and that a directive-based approach to program accelerators is more efficient than low-level APIs, even if suboptimal performance is achieved. Expand
Compiler support of the workqueuing execution model for Intel SMP architectures
TLDR
An overview of the workqueuing model is given and both the generated multithreaded code as well as the run-time library routines supporting it are described, which allows the user to exploit irregular parallelism. Expand
CUDA by Example: An Introduction to General-Purpose GPU Programming
  • Jie Cheng
  • Computer Science
  • Scalable Comput. Pract. Exp.
  • 2010
TLDR
This book is designed for readers who are interested in studying how to develop general parallel applications on graphics processing unit (GPU) by using CUDA C, a programming language which combines industry standard programming C language and some more features which can exploit CUDA architecture. Expand
Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters
TLDR
A hybrid scheduling technique for GPU-based computer clusters, which minimizes the execution time of a submitted job using dynamic profiles of Map tasks running on CPU cores and GPU devices is proposed. Expand
Implementing the PGI Accelerator model
TLDR
Details of the design of the compiler that implements the PGI Accelerator model are presented, focusing on the Planner, the element that maps the program parallelism onto the hardware parallelism. Expand
Study of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications
  • Y. Yamato
  • Computer Science
  • Journal of Intelligent Information Systems
  • 2019
TLDR
An improved GPU offloading method with fewer data transfers between the CPU and GPU that can improve performance of many IoT applications is proposed. Expand
Server Selection, Configuration and Reconfiguration Technology for IaaS Cloud with Multiple Server Types
  • Y. Yamato
  • Computer Science
  • Journal of Network and Systems Management
  • 2017
TLDR
A server selection, configuration, reconfiguration and automatic performance verification technology to meet user functional and performance requirements on various types of cloud compute servers to enable cloud providers to provision compute resources on appropriate hardware based on user requirements. Expand
Automatic GPU Offloading Technology for Open IoT Environment
TLDR
This paper proposes an automatic graphics processing unit (GPU) offloading technology as a new elementary technology of Tacit Computing that uses genetic algorithm to extract appropriate offloading areas from parallelizable loop statements automatically to improve performance of IoT applications. Expand
Optimum Application Deployment Technology for Heterogeneous IaaS Cloud
  • Y. Yamato
  • Computer Science
  • J. Inf. Process.
  • 2017
TLDR
A PaaS which analyzes application logics and offloads computations to GPU and FPGA automatically when users deploy applications to clouds is proposed. Expand
A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services
TLDR
The authors deployed the reconfigurable fabric in a bed of 1,632 servers and FPGAs in a production datacenter and successfully used it to accelerate the ranking portion of the Bing Web search engine by nearly a factor of two. Expand
...
1
2
3
4
...