Trung N. Nguyen

Learn More
We present several new compiler techniques employed by our interprocedural parallelizing research compiler, Panorama, to improve loop parallelization and the eeciency of memory references. We rst present an overview of the compiler and its associated memory architecture simulation environments. We then present an interprocedural array dataaow analysis,(More)
In order to reduce remote memory accesses on CC-NUMA multiproces-sors, we present an interprocedural analysis to support static loop scheduling and data allocation. Given a parallelized program, the compiler constructs graphs which represent globally and interprocedurally the remote reference penalties associated with diierent choices for loop scheduling(More)
{ In the decision regarding static scheduling vs. dynamic scheduling, the only argument against the former is the potential imbalance of the workload. However, it has never been clear how the workload distributes in the iterations of Fortran parallel loops. This work examines a set of Perfect benchmarking programs 2] and report two striking results. First,(More)
{ Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures have received strong interests from both academia and industries. This paper studies the performance impact of design choices at diierent levels of address and memory mapping on CC-NUMA architectures. Through execution-driven simulations of ve numerical programs, we nd close interactions(More)
Array data ow analysis is known to be crucial to the success of array privatization, one of the most important techniques for program parallelization. It is clear that array data ow analysis should be performed interprocedurally and symbolically , and that it often needs to handle the predicates represented by IF conditions. Unfortunately, such a powerful(More)
We present an interprocedural program analysis to support static loop scheduling and data allocation with the objective of reducing remote memory references on CC-NUMA multiprocessors. Given a program which consists of parallel regions in the form of DOALL loops and sequential regions, we build an interprocedural control ow graph and annotate it with array(More)