• Publications
  • Influence
A practical automatic polyhedral parallelizer and locality optimizer
TLDR
We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Expand
Parameterized tiling revisited
TLDR
Tiling, a key transformation for optimizing programs, has been widely studied in literature. Expand
Annotation-based empirical performance tuning using Orio
TLDR
We introduce an extensible annotation-based empirical tuning system called Orio that is aimed at improving both performance and productivity. Expand
Parametric multi-level tiling of imperfectly nested loops
TLDR
We present an approach to parametric multi-level tiling of imperfectly nested loops. Expand
DynTile: Parametric tiled loop generation for parallel execution on multicore processors
TLDR
We describe DynTile, a system for transforming untiled sequential input C code containing affine imperfectly nested loops to parametrically tiled code for parallel execution on multicore processors. Expand
Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters
TLDR
This paper presents a new design for MPI intra-node communication that aims to achieve both high performance and good scalability in a cluster environment. Expand
Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations
TLDR
We develop an effective heuristic approach to the operation minimization problem, and demonstrate its effectiveness on tensor contraction expressions for coupled cluster equations. Expand
Parametric Tiling of Affine Loop Nests
Tiling, a key transformation for optimizing programs, has b een widely studied in the literature. Parameterized tiled code is impo rtant for auto-tuning systems since they often execute a largeExpand
Towards effective automatic parallelization for multicore systems
TLDR
The ubiquity of multicore processors in commodity computing systems has raised a significant programming challenge for their effective use. Expand
...
1
2
...