Learn More
This paper presents an overview of an ongoing NSF-sponsored project for the study of runtime systems and compilers to support the development of efficient OpenMP parallel programs for distributed memory systems. The first part of the paper discusses a prototype compiler, now under development, that will accept OpenMP and will target TreadMarks, a Software(More)
Dependence graphs can be used as a vehicle for formulating and implementing compiler optimizations. This paper defines such graphs and discusses two kinds of transformations. The first are simple rewriting transformations that remove dependence arcs. The second are abstraction transformations that deal more globally with a dependence graph. These(More)
Simultaneity is a key to high-speed computation. Assuming hardware components of a given speed, it is the only remaining consideration in achieving raw speed. Simultaneity can be shackled by dependences, however, and years of hardware and software work have been devoted to understanding the types of dependences and how they can be obeyed or removed from a(More)
This paper introduces hierarchical overlapped tiling, a transformation that applies loop tiling and fusion to conventional loops. Overlapped tiling is a useful transformation to reduce communication overhead, but it may also generate a significant amount of redundant computation. Hierarchical overlapped tiling performs overlapped tiling hierarchically to(More)
With modern massively parallel processors consisting of hundreds of RISC processors, a significant portion of program run time is being spent in passing messages between processors. This paper addresses the problem of reducing unnecessary copies in message passing protocols. Message passing that avoids using copies In the application program, and the(More)
Pronounced spatial nonuniformities in cell density, physiology, and activity frequently arise within densely packed immobilized cell supports. For a more fundamental understanding of immobilized cell phenomena, we have developed high-resolution microfluorimetric procedures to analyze local variations in both immobilized cell loading and growth rate.(More)
The KAP preprocessor optimizes DEC Fortran and DEC C programs to achieve their best performance on Digital Alpha systems. One key optimization that KAP performs is the parallelization of programs for Alpha shared memory multiprocessors that use the new capabilities of the DEC OSF/1 version 3.0 operating system with DECthreads. The heart of the optimizer is(More)
  • 1