Learn More
Increasing demand for performance and efficiency has driven the computer industry toward multicore systems. These systems have become the industry standard in almost all segments of the computer market from high-end servers to handheld devices. In order to efficiently use these systems, an extensive amount of research and industry support has been devoted(More)
In this article, we present a novel linear time algorithm for <i>data remapping</i>, that is, (i) lightweight; (ii) fully automated; and (iii) applicable in the context of pointer-centric programming languages with dynamic memory allocation support. All previous work in this area lacks one or more of these features. We proceed to demonstrate a <i>novel(More)
/begin(thebibliography)(10) /bibitem(trimaran-presentation) The (Trimaran) compiler research infrastructure for instruction level parallelism. /newblock. pipelining and instruction level parallelism (ILP). placing data/instructions into the appropriate level of SPM to achieve the best We have undertaken comprehensive research work on instruction level(More)
The halt in clock frequency scaling has forced architects and language designers to look elsewhere for continued improvements in performance. We believe that extracting maximum performance will require compilation to highly heterogeneous architectures that include reconfigurable hardware. We present a new language, Lime, which is designed to be executable(More)
Effective use of the memory hierarchy is critical for achieving high performance on embedded systems. We focus on the class of streaming applications, which is increasingly prevalent in the embedded domain. We exploit the widespread parallelism and regular communication patterns in stream programs to formulate a set of cache aware optimizations that(More)
Image and video codecs are prevalent in multime-dia devices, ranging from embedded systems, to desktop computers, to high-end servers such as HDTV editing consoles. It is not uncommon however that developers create and customize separate coder and decoder implementations for each of the architectures they target. This practice is time consuming and error(More)
This paper introduces the concept of <i>programming with sketches</i>, an approach for the rapid development of high-performance applications. This approach allows a programmer to write clean and portable reference code, and then obtain a high-quality implementation by simply <i>sketching</i> the outlines of the desired implementation. Subsequently, a(More)
Languages such as OpenCL and CUDA offer a standard interface for general-purpose programming of GPUs. However, with these languages, programmers must explicitly manage numerous low-level details involving communication and synchronization. This burden makes programming GPUs difficult and error-prone, rendering these powerful devices inaccessible to most(More)
In this paper, we provide a novel compile-time <i>data remapping</i> algorithm that runs in linear time. This remapping algorithm is the first fully automatic approach applicable to pointer-intensive dynamic applications. We show that data remapping can be used to significantly reduce the <i>energy consumed</i> as well as the <i>memory size</i> needed to(More)