Mustafa Karaköy

Learn More
Data prefetching has been widely used in the past as a technique for hiding memory access latencies. However, data prefetching in multi-threaded applications running on chip multiprocessors (CMPs) can be problematic when multiple cores compete for a shared on-chip cache (L2 or L3). In this paper, we (i) quantify the impact of conventional data prefetching(More)
Ensuring that most of data accesses are satisfied from on-chip memories is a critical problem for chip multiprocessors, as the cost of an off-chip access can be very high. Particularly, multiple cores that need to access the off-chip memory system may contend with each other for the same buses/pins to get there. While it is possible to structure on-chip(More)
Silicon technology advances have made it possible to pack millions of transistors --- switching at high clock speeds --- on a single chip. While these advances bring unprecedented performance to electronic products, they pose difficult power/energy consumption problems. For example, large number of transistors in dense on-chip cache memories consume(More)
Chip multiprocessing (or multiprocessor system-on-a-chip) is a technique that combines two or more processor cores on a single piece of silicon to enhance computing performance. An important problem to be addressed in executing applications on an on-chip multiprocessor environment is to select the most suitable number of processors to use for a given(More)
There exist many embedded applications such as those executing on set-top boxes, wireless base stations, HDTV, and mobile handsets that are structured as nested loops and benefit significantly from software managed memory. Prior work on scratchpad memories (SPMs) focused primarily on applications with regular data access patterns. Unfortunately, some(More)
One of the most important issues in designing a chip multiprocessor is to decide its on-chip memory organization. A poor on-chip memory design can have serious power and performance implications when running data-intensive embedded applications. While it is possible to design an application-specific memory architecture, this may not be the best option, in(More)
One of the previously-proposed techniques for reducing memory energy consumption is memory banking. The idea is to divide the memory space into multiple banks and place currently unused (idle) banks into a low-power operating mode. The prior studies -- both hardware and software domain - in memory energy optimization via low-power modes do not take the data(More)
In this work, we propose, implement and test a novel approach to the management of parallel I/O in high-performance computing. Our proposed approach is built upon three complementary ideas: (i) allowing users to place hints into the application code indicating high-level data access patterns, (ii) enabling an optimizing compiler to process these hints and(More)
While loop restructuring based code optimization for array intensive applications has been successful in the past, it has several problems such as the requirement of checking dependences (legality issues) and transformation of all of the array references within the loop body indiscriminately (while some of the references can benefit from the transformation,(More)