Register allocation for Intel processor graphics

@article{Chen2018RegisterAF,
  title={Register allocation for Intel processor graphics},
  author={Weiyu Chen and Guei-Yuan Lueh and Pratik Ashar and Kaiyu Chen and Buqi Cheng},
  journal={Proceedings of the 2018 International Symposium on Code Generation and Optimization},
  year={2018}
}
  • Weiyu ChenGuei-Yuan Lueh B. Cheng
  • Published 24 February 2018
  • Computer Science
  • Proceedings of the 2018 International Symposium on Code Generation and Optimization
Register allocation is a well-studied problem, but surprisingly little work has been published on assigning registers for GPU architectures. [] Key Method Several extensions are introduced to the traditional graph-coloring algorithm to support variables with different sizes and to accurately model liveness under divergent branches. Different assignment polices are applied to exploit the trade-offs between minimizing register usage and avoiding bank conflicts and anti-dependencies. Experimental results show…

Irregular Register Allocation for Translation of Test-pattern Programs

This article proposes a solution based on partitioned Boolean quadratic programming (PBQP) for ATE register allocation that trades off the allocation time and allocation search space, and experimental results show that the proposed register allocator successfully finds valid solutions in all cases.

IGC: The Open Source Intel Graphics Compiler

The Intel Graphics Compiler (IGC), the LLVM-based production compiler for Intel HD and Iris graphics, and its OpenCL compute stack including compute runtime, compiler frontend and backend, and architecture specification is fully open-source, giving a unique opportunity for developers to optimize the entire stack.

Optimizing occupancy and ILP on the GPU using a combinatorial approach

This paper presents the first general solution to the problem of optimizing both occupancy and Instruction-Level Parallelism (ILP) when compiling for a Graphics Processing Unit (GPU) by treating occupancy as a primary objective and ILP as a secondary objective.

C-for-Metal: High Performance Simd Programming on Intel GPUs

Experimental results show that CM applications from different domains outperform the best-known SIMT-based OpenCL implementations, achieving up to 2.7x speedup on the latest Intel GPU.

High-performance and balanced parallel graph coloring on multicore platforms

The proposed algorithmic design is extended to propose a balanced graph coloring algorithm, named BalColorTM, with which all color classes include almost the same number of vertices to achieve high parallelism and resource utilization in the execution of the real-world end-applications.

Output-based Intermediate Representation for Translation of Test-pattern Program

This paper proposes a completely different type of IR, generated as a result of running the source program, the output-based IR, using the output pattern itself as an IR, which obviates designing a static IR considering all ATE programming languages and hardware differences.

iGPU Leak: An Information Leakage Vulnerability on Intel Integrated GPU

In this work, a critical information leakage vulnerability due to defective GPU context management is disclosed, which means adversaries can recover the secret key of a cryptographic algorithm running on an iGPU from a single snapshot of the leaking channel.

RL4ReAl: Reinforcement Learning for Register Allocation

A novel solution for the Register Allocation problem is proposed, leveraging multi-agent hierarchical Reinforcement Learning, and a gRPC based framework is developed providing a modular and e-cient compiler interface for training and inference.

Solving Multi-Coloring Combinatorial Optimization Problems Using Hybrid Quantum Algorithms

The variational quantum eigensolver (VQE) technique and quantum approximate optimization algorithm (QAOA) are utilized to find solutions for three combinatorial applications by both transferring each problem model to the corresponding Ising model and by using the calculated Hamiltonian matrices.

A distributed large graph coloring algorithm on Giraph

This paper presents a novel graph coloring algorithm designed for utilizing the simple parallelization technique provided by the Giraph framework or any other vertex-centric paradigm, and has compared its algorithm to existing Giraph graph coloring algorithms with regard to solution quality and CPU runtime.

References

SHOWING 1-10 OF 37 REFERENCES

Register Spilling and Live-Range Splitting for SSA-Form Programs

This paper generalizes the well-known furthest-first algorithm, which is known to work well on straight-line code, to control-flow graphs, and presents a spilling algorithm for programs in SSA form that is competitive with standard linear-scan allocators.

Enabling coordinated register allocation and thread-level parallelism optimization for GPUs

This paper proposes Coordinated Register Allocation and Thread-level parallelism ( CRAT) to explore the optimization space of register allocation and TLP management on GPUs and shows that CRAT-static works statically to explore TLP and register allocation trade-off andCRAT-dyn exploits dynamic register allocation for further improvement.

Author retrospective for code scheduling and register allocation in large basic blocks

This work introduced the concept of the DAG-driven register allocation, and defined two terms, width and height of a DAG, in the context of code scheduling.

Fusion-based register allocation

Fusion-based register allocation uses the structure of the program to make splitting and spilling decisions, with the goal to move overhead operations to infrequently executed parts of a program.

Retargetable Graph-Coloring Register Allocation for Irregular Architectures

This work presents a generalization of graph-coloring register allocation that can handle irregular architectural features like overlapping register pairs, special purpose registers, and multiple register banks and presents a parameterized on a formal target description, allowing fully automatic retargeting.

Register allocation and spilling via graph coloring

This work has discovered how to extend the graph coloring approach so that it naturally solves the spilling problem, and produces better object code and takes much less compile time.

Register allocation for programs in SSA form

A novel register allocation architecture for programs in SSA-form is presented which simplifies register allocation significantly and a heuristic methods for spilling and coalescing are compared to an optimal method based on integer linear programming.

Register allocation by puzzle solving

We show that register allocation can be viewed as solving a collection of puzzles. We model the register file as a puzzle board and the program variables as puzzle pieces; pre-coloring and register

Combining Register Allocation and Instruction Scheduling

Preliminary experiments indicate that the (alpha,beta)-Combined Heuristic yields improvements in the range of 16-21% compared to the phase-ordered solutions, when the input graphs contain balanced amount of register pressure and instruction-level parallelism.

Register allocation with instruction scheduling

In this framework an optimal coloring of a graph, called the parallel interference graph, provides an optimal register allocation and preserves the property that no false dependences are introduced, thus all the options for parallelism are kept for the scheduler to handle.