Learn More
The key to high performance in Simultaneous Multithreaded (SMT) processors lies in optimizing the distribution of shared resources to active threads. Existing resource distribution techniques optimize performance only indirectly. They infer potential performance bottlenecks by observing indicators, like instruction occupancy or cache miss counts, and take(More)
Simultaneous Multithreading (SMT) processors achieve high processor throughput at the expense of single-thread performance. This paper investigates resource allocation policies for SMT processors that preserve, as much as possible, the single-thread performance of designated " foreground " threads, while still permitting other " background " threads to(More)
Abstract: Pointer-chasing applications tend to traverse composed data structures consisting of multiple independent pointer chains. While the traversal of any single pointer chain leads to the serialization of memory operations, the traversal of independent pointer chains provides a source of memory parallelism. This paper presents multi-chain prefetching,(More)
This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-chain prefetching tolerates serialized memory latency commonly found in pointer chasing codes via aggressive prefetch s c heduling. Unlike conventional prefetching techniques that hide memory latency underneath a single traversal loop or recursive function(More)
Pointer-chasing applications tend to traverse composite data structures consisting of multiple independent pointer chains. While the traversal of any single pointer chain leads to the serialization of memory operations, the traversal of independent pointer chains provides a source of memory parallelism. This article investigates exploiting such(More)
The key to high performance in Simultaneous MultiThreaded (SMT) processors lies in optimizing the distribution of shared resources to active threads. Existing resource distribution techniques optimize performance only indirectly. They infer potential performance bottlenecks by observing indicators, like instruction occupancy or cache miss counts, and take(More)
The key to high performance in SMT processors lies in optimizing the shared resources distribution among simultaneously executing threads. Existing resource distribution techniques optimize performance only indirectly. They infer potential performance bottlenecks by observing indicators, like instruction occupancy or cache miss count, and take actions to(More)
As the processor-memory performance gap continues to widen, application performance becomes increasingly limited by the memory system. Applications that employ linked data structures (LDSs) are particularly challenging from the standpoint of the memory system because of the memory serialization eeects associated with indirect memory addressing. Also known(More)
  • 1