Nathalie Drach-Temam

Learn More
Hardware and software cache optimizations are active elds of research, that have yielded powerful but occasionally complex designs and algorithms. The purpose of this paper is to investigate the performance of combined though simple software and hardware optimizations. Because current caches provide little exibility for exploiting temporal and spatial(More)
Increasingly complex consumer electronics applications call for embedded processors with higher performance. Multi-cores are capable of delivering the required performance. However, many of these embedded applications must meet some form of soft real-time constraints, and program behavior on multi-cores is even harder to predict than on single-cores. In(More)
In this paper we evaluate the performance of an SMT processor used as the geometry processor for a 3D polygonal rendering engine. To evaluate this approach, we consider PMesa (a parallel version of Mesa) which parallelizes the geometry stage of the 3D pipeline. We show that SMT is suitable for 3D geometry and we characterize the execution of the geometry(More)
As technology enables to integrate real-time good quality 3D rendering in a single chip, the classical problem of the gap between internal data bandwidth and external memories arises. The texture mapping function requires a tremendous number of texture accesses and many past implementations have been based on costly high bandwidth external memory. Our(More)
This paper presents the performance of DSP, image and 3D applications on recent general-purpose microprocessors using streaming SIMD ISA extensions (integer and oating point). The 9 benchmarks benchmark we use for this evaluation have been optimized for DLP and caches use with SIMD extensions and data prefetch. The result of these cumulated optimizations is(More)
There are two major clifficnlties in implement ing prefet thing: avoicling stalling the cache because of prefetch operations, and maintaining coherence between prefet cl] recluests ancl the cache content.. The first constraint is critical because stalling the cache is likely to mean stalling the processor since superscalar processors can issue up to a cache(More)
Architecturesparalì eles, bases de données, réseaux et systèmes distribués About cache associativity in low-cost shared memory multi-microprocessors Abstract: In 1993, sizes of on-chip caches on current commercial microprocessors range from 16K bytes to 36 Kbytes. These microprocessors can be directly used in the design of a low cost single-bus shared(More)