Subdividing the iteration space of a loop into blocks or <italic>tiles</italic> with a fixed maximum size has several advantages. Tiles become a natural candidate as the unit of work for parallel task scheduling. Synchronization between processors can be done between tiles, reducing synchronization frequency (at some loss of potential parallelism). The… (More)
Compilers for vector or multiprocessor computers must have certain optimization features to successfully generate parallel code.
Dependence graphs can be used as a vehicle for formulating and implementing compiler optimizations. This paper defines such graphs and discusses two kinds of transformations. The first are simple rewriting transformations that remove dependence arcs. The second are abstraction transformations that deal more globally with a dependence graph. These… (More)
The PGI Accelerator model is a high-level programming model for accelerators, such as GPUs, similar in design and scope to the widely-used OpenMP directives. This paper presents some details of the design of the compiler that implements the model, focusing on the <i>Planner</i>, the element that maps the program parallelism onto the hardware parallelism.
Linear induction variable detection is usually associated with the strength reduction optimization. For restructuring compilers, eeective data dependence analysis requires that the compiler detect and accurately describe linear and nonlinear induction variables as well as more general sequences. In this article we present a practical technique for detecting… (More)
Induction variable detection is usually closely tied to the strength reduction optimization. This paper studies induction variable analysis from a different perspective, that of finding induction variables for data dependence analysis. While classical induction variable analysis techniques have been used successfully up to now, we have found a simple… (More)
We describe and prove algorithms to convert programs which use the Parallel Computing Forum Parallel Sections construct into Static Single Assignment (SSA) form. This proces allows compilers to apply classical scalar optimization algorithms to explicitly parallel programs. To do so, we must define what the concept of <italic>dominator</italic> and… (More)