Steven R. Kunkel

Learn More
As memory speeds grow at a considerably slower rate than processor speeds, memory accesses are starting to dominate the execution time of processors, and this will likely continue into the future. This trend will be exacerbated by growing miss rates due to commercial applications, object-oriented programming and micro-kernel based operating systems. We(More)
Software prefetching, typically in the context of numeric-or loop-intensive benchmarks, has been proposed as one remedy for the performance bottleneck imposed on computer systems by the cost of servicing cache misses. This paper proposes a new heuristic–SPAID–for utilizing prefetch instructions in pointer-and call-intensive environments. We use trace-driven(More)
Coarse-grained multithreading, the switching of threads to avoid idle processor time during long-latency events, has been available on IBM systems since 1998. Simultaneous multithreading (SMT), first available on the POWER5e processor, moves beyond simple thread switching to the maintenance of two thread streams that are issued as continuously as possible(More)
As the degree of instruction-level parallelism in superscalar architectures increases, the gap between processor and memory performance continues to grow requiring more aggressive techniques to increase the performance of the memory system. We propose a new technique, which is based on the wrong-path execution of loads far beyond instruction fetch-limiting(More)
We investigate directions for exploiting what might be termed pattern locality in a cache hierarchy, based on recording cache discards or victims. An advantage of storing discard decisions is the reduced duplication of pertinent information, as well as the maintenance of information on the current location of discarded lines. Typical caches are designed to(More)