Learn More
Programmers for GPGPU face rapidly changing substrate of programming abstractions, execution models, and hardware implementations. It has been established, through numerous demonstrations for particular conjunctions of application kernel, programming languages, and GPU hardware instance, that it is possible to achieve significant improvements in the(More)
DARPA's Ubiquitous High-Performance Computing (UHPC) program asked researchers to develop computing systems capable of achieving energy efficiencies of 50 GOPS/Watt, assuming 2018-era fabrication technologies. This paper describes Runnemede, the research architecture developed by the Intel-led UHPC team. Runnemede is being developed through a co-design(More)
A significant source for enhancing application performance and for reducing power consumption in embedded processor applications is to improve the usage of the memory hierarchy. In this paper, a temporal and spatial locality optimization framework of nested loops is proposed, driven by parameterized cost functions. The considered loops can be imperfectly(More)
One of the most efficient ways to improve program performances onto nowadays computers is to optimize the way cache memories are used. In particular, many scientific applications contain loop nests that operate on large multi-dimensional arrays whose sizes are often parameterized. No special attention is paid to cache memory performance when such loops are(More)
We propose a new set of automated techniques to optimize memory reuse in programs with explicitly managed memory. Our techniques are inspired by hand-tuned seismic kernels on GPUs. The solutions we develop reduce the cost of transferring data across multiple memories with different bandwidth, latency and addressability properties. They result in reduction(More)
This paper presents a new method for computing the integer hull of a parameterized rational polyhedron by introducing the concept of periodic polyhedron. Besides concerning generally parametric combi-natorial optimization, the method has many applications for the analysis, optimization and parallelization of loop nests, especially in compilers. 1 Motivation(More)