The determination of upper bounds on execution times, commonly called worst-case execution times (WCETs), is a necessary step in the development and validation process for hard real-time systems.… (More)
Previous timing analysis methods have assumed that the worst-case instruction execution time necessarily corresponds to the worst-case behavior. We show that this assumption is wrong in dynamically… (More)
Two interesting variations of large-scale shared-memory machines that have recently emerged are <italic>cache-coherent non-uniform-memory-access</italic> machines (CC-NUMA) and <italic>cache-only… (More)
System level simulators allow computer architects and system software designers to recreate an accurate and complete replica of the program behavior of a target system, regardless of the… (More)
To offset the effect of read miss penalties on processor utilization in shared-memory multiprocessors, several software- and hardware-based data prefetching schemes have been proposed. A major… (More)
Lossless data compression techniques can potentially free up more than 50% of the memory resources. However, previously proposed schemes suffer from high access costs. The proposed main-memory… (More)
The significant speed-gap between processor and memory and the limited chip memory bandwidth make last-level cache performance crucial for future chip multiprocessors. To use the capacity of shared… (More)
To offset the effect of read miss penalties on processor utilization in shared-memory multiprocessors, several softwareand hardware-based data prefetching schemes have been proposed. A major… (More)
Parallel programs that use critical sections and are executed on a shared-memory multiprocessor with a write-invalidate protocol result in invalidation actions that could be eliminated. For this type… (More)
Prefetching offers the potential to improve the performance of linked data structure (LDS) traversals. However, previously proposed prefetching methods only work well when there is enough work… (More)