Nathalie Drach-Temam

Learn More
Hardware and software cache optimizations are active elds of research, that have yielded powerful but occasionally complex designs and algorithms. The purpose of this paper is to investigate the performance of combined though simple software and hardware optimizations. Because current caches provide little exibility for exploiting temporal and spatial(More)
Increasingly complex consumer electronics applications call for embedded processors with higher performance. Multi-cores are capable of delivering the required performance. However, many of these embedded applications must meet some form of soft real-time constraints, and program behavior on multi-cores is even harder to predict than on single-cores. In(More)
In this paper we evaluate the performance of an SMT processor used as the geometry processor for a 3D polygonal rendering engine. To evaluate this approach, we consider PMesa (a parallel version of Mesa) which parallelizes the geometry stage of the 3D pipeline. We show that SMT is suitable for 3D geometry and we characterize the execution of the geometry(More)
There are two major clifficnlties in implement ing prefet thing: avoicling stalling the cache because of prefetch operations, and maintaining coherence between prefet cl] recluests ancl the cache content.. The first constraint is critical because stalling the cache is likely to mean stalling the processor since superscalar processors can issue up to a cache(More)
As technology enables to integrate real-time good quality 3D rendering in a single chip, the classical problem of the gap between internal data bandwidth and external memories arises. The texture mapping function requires a tremendous number of texture accesses and many past implementations have been based on costly high bandwidth external memory. Our(More)
This paper presents the performance of DSP, image and 3D applications on recent general-purpose microprocessors using streaming SIMD ISA extensions (integer and oating point). The 9 benchmarks benchmark we use for this evaluation have been optimized for DLP and caches use with SIMD extensions and data prefetch. The result of these cumulated optimizations is(More)
The purpose of the semi-unified on-chip cache organization, is to use the data cache (resp. ins truction cache) as an on-chip second-level cache for instructions (resp. data). Thus the associativity de gree of both on-chip caches is artificially increased, and the cache spaces respectively devoted to instruc tions and data are dynamically adjusted. The(More)
We present Aftermath, an open source graphical tool designed to assist in the performance debugging process of task-parallel programs by visualizing, filtering and analyzing execution traces interactively. To efficiently exploit increasingly complex and concurrent hardware architectures, both the application and the run-time system that manages task(More)
Architecturesparalì eles, bases de données, réseaux et systèmes distribués About cache associativity in low-cost shared memory multi-microprocessors Abstract: In 1993, sizes of on-chip caches on current commercial microprocessors range from 16K bytes to 36 Kbytes. These microprocessors can be directly used in the design of a low cost single-bus shared(More)
To meet the high demand for powerful embedded processors,VLIW architectures are increasingly complex (e.g.,multiple clusters), and moreover, they now run increasinglysophisticated control-intensive applications. As a result, developingarchitecture-specific compiler optimizations is becomingboth increasingly critical and complex, while time-to-market(More)