Trung N. Nguyen

Learn More
We present several new compiler techniques employed by our interprocedural parallelizing research compiler, Panorama, to improve loop parallelization and the eeciency of memory references. We rst present an overview of the compiler and its associated memory architecture simulation environments. We then present an interprocedural array dataaow analysis,(More)
Dynamiically tagged directories have been recently proposed as a memory-efficient mechanism for maintaining cache coherence in large-scale shared-memory multiprocessors. In order to efficiently use these directories, the number of pointer operations must be minimized and pointers should be allocated as late as possible. If pointers are allocated too early,(More)
In order to reduce remote memory accesses on CC-NUMA multiproces-sors, we present an interprocedural analysis to support static loop scheduling and data allocation. Given a parallelized program, the compiler constructs graphs which represent globally and interprocedurally the remote reference penalties associated with diierent choices for loop scheduling(More)
{ In the decision regarding static scheduling vs. dynamic scheduling, the only argument against the former is the potential imbalance of the workload. However, it has never been clear how the workload distributes in the iterations of Fortran parallel loops. This work examines a set of Perfect benchmarking programs 2] and report two striking results. First,(More)
Fuzzy vault is one of the most popular algorithms, which is used to protect the biometric templates and secret key simultaneously. In the fuzzy vault scheme, the biometric features are used to lock and unlock the secret key, which is encoded in the coefficients of a polynomial equation. Its security depends on the infeasibility of the polynomial(More)
{ Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures have received strong interests from both academia and industries. This paper studies the performance impact of design choices at diierent levels of address and memory mapping on CC-NUMA architectures. Through execution-driven simulations of ve numerical programs, we nd close interactions(More)
Array data ow analysis is known to be crucial to the success of array privatization, one of the most important techniques for program parallelization. It is clear that array data ow analysis should be performed interprocedurally and symbolically , and that it often needs to handle the predicates represented by IF conditions. Unfortunately, such a powerful(More)
Dynamically tagged directories have been proposed as a memory-eecient mechanism for maintaining cache coherence in large-scale shared-memory multiprocessors. To eeciently use these directories, the run-time overhead caused by directory pointer overrow must be reduced by allocating pointers as late as possible. If directory pointers are allocated too early,(More)