Albert Hartono

Learn More
We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this work, we show the practicality of analytical model-driven automatic transformation in the polyhedral model(More)
Tiling is a crucial loop transformation for generating high performance code on modern architectures. Efficient generation of multi-level tiled code is essential for maximizing data reuse in systems with deep memory hierarchies. Tiled loops with parametric tile sizes (not compile-time constants) facilitate runtime feedback and dynamic optimizations used in(More)
As new processor and memory architectures advance, clusters start to be built from larger SMP systems, which makes MPI intra-node communication a critical issue in high performance computing. This paper presents a new design for MPI intra-node communication that aims to achieve both high performance and good scalability in a cluster environment. The design(More)
Tiling, a key transformation for optimizing programs, has been widely studied in literature. Parameterized tiled code is important for auto-tuning systems since they often execute a large number of runs with dynamically varied tile sizes. Previous work on tiled code generation has addressed parameterized tiling for the sequential context, and the parallel(More)
Loop tiling is an important compiler transformation used for enhancing data locality and exploiting coarsegrained parallelism. Tiled codes in which tile sizes are runtime parameters — called parametrically-tiled codes — are important for empirical tuning systems like ATLAS. Some recent work has addressed the problem of generating sequential(More)
For many scientific applications, significant time is spent in tuning codes for a particular high-performance architecture. Tuning approaches range from the relatively nonintrusive (e.g., by using compiler options) to extensive code modifications that attempt to exploit specific architecture features. Intrusive techniques often result in code changes that(More)
Complex tensor contraction expressions arise in accurate electronic structure models in quantum chemistry, such as the Coupled Cluster method. Transformations using algebraic properties of commutativity and associativity can be used to significantly decrease the number of arithmetic operations required for evaluation of these expressions, but the(More)
Tiling is a crucial loop transformation for generating high performance code on modern architectures. Efficient generation of multi-level tiled code is essential for maximizing data reuse in systems with deep memory hierarchies. Tiled loops with parametric tile sizes (not compile-time constants) facilitate runtime feedback and dynamic optimizations used in(More)
The ubiquity of multicore processors in commodity computing systems has raised a significant programming challenge for their effective use. An attractive but challenging approach is automatic parallelization of sequential codes. Although virtually all production C compilers have automatic shared-memory parallelization capability, it is rarely used in(More)