Learn More
As memory speeds grow at a considerably slower rate than processor speeds, memory accesses are starting to dominate the execution time of processors, and this will likely continue into the future. This trend will be exacerbated by growing miss rates due to commercial applications, object-oriented programming and micro-kernel based operating systems. We(More)
Software prefetching, typically in the context of numeric-or loop-intensive benchmarks, has been proposed as one remedy for the performance bottleneck imposed on computer systems by the cost of servicing cache misses. This paper proposes a new heuristic–SPAID–for utilizing prefetch instructions in pointer-and call-intensive environments. We use trace-driven(More)
Coarse-grained multithreading, the switching of threads to avoid idle processor time during long-latency events, has been available on IBM systems since 1998. Simultaneous multithreading (SMT), first available on the POWER5e processor, moves beyond simple thread switching to the maintenance of two thread streams that are issued as continuously as possible(More)
As the degree of instruction-level parallelism in superscalar architectures increases, the gap between processor and memory performance continues to grow requiring more aggressive techniques to increase the performance of the memory system. We propose a new technique, which is based on the wrong-path execution of loads far beyond instruction fetch-limiting(More)