Share This Author
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches
- Moinuddin K. Qureshi, Y. Patt
- Computer Science39th Annual IEEE/ACM International Symposium on…
- 9 December 2006
This paper investigates the problem of partitioning a shared cache between multiple concurrently executing applications. The commonly used LRU policy implicitly partitions a shared cache on a demand…
Adaptive insertion policies for high performance caching
A Dynamic Insertion Policy (DIP) is proposed to choose between BIP and the traditional LRU policy depending on which policy incurs fewer misses, and shows that DIP reduces the average MPKI of the baseline 1MB 16-way L2 cache by 21%, bridging two-thirds of the gap between LRU and OPT.
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers
- S. Srinath, O. Mutlu, Hyesoon Kim, Y. Patt
- Computer ScienceIEEE 13th International Symposium on High…
- 10 February 2007
Results show that feedback-directed prefetching eliminates the large negative performance impact incurred on some benchmarks due to prefetcher, and it is applicable to stream-based prefetchers, global-history-buffer based delta correlation prefetchery, and PC-based stridePrefetchers.
Runahead execution: an alternative to very large instruction windows for out-of-order processors
- O. Mutlu, J. Stark, C. Wilkerson, Y. Patt
- Computer ScienceThe Ninth International Symposium on High…
- 8 February 2003
This paper proposes runahead execution as an effective way to increase memory latency tolerance in an out-of-order processor without requiring an unreasonably large instruction window.
Improving GPU performance via large warps and two-level warp scheduling
- V. Narasiman, M. Shebanow, Chang Joo Lee, Rustam Miftakhutdinov, O. Mutlu, Y. Patt
- Computer Science44th Annual IEEE/ACM International Symposium on…
- 3 December 2011
This work proposes two independent ideas: the large warp microarchitecture and two-level warp scheduling that improve performance by 19.1% over traditional GPU cores for a wide variety of general purpose parallel applications that heretofore have not been able to fully exploit the available resources of the GPU chip.
The V-Way cache: demand-based associativity via global replacement
- Moinuddin K. Qureshi, D. Thompson, Y. Patt
- Computer Science32nd International Symposium on Computer…
The proposed variable-way, or V-Way, set-associative cache achieves an average miss rate reduction of 13% on sixteen benchmarks from the SPEC CPU2000 suite, which translates into an average IPC improvement of 8%.
Alternative Implementations of Two-Level Adaptive Branch Prediction
This work proposes a new dynamic branch predictor that achieves substantially higher accuracy than any other scheme reported in the literature, and measures the effectiveness of different prediction algorithms and different amounts of history and pattern information.
Scheduling algorithms for modern disk drives
This work examines the impact of complex logical-to-physical mappings and large prefetching caches on scheduling effectiveness and concludes that cyclical scan algorithm (C-LOOK), which always schedules requests in ascending logical order, achieves the highest performance among seek-reducing algorithms for such workloads.
A Case for MLP-Aware Cache Replacement
- Moinuddin K. Qureshi, Daniel N. Lynch, O. Mutlu, Y. Patt
- Computer Science33rd International Symposium on Computer…
- 1 May 2006
Evaluations with the SPEC CPU2000 benchmarks show that MLP-aware cache replacement can improve performance by as much as 23% and a novel, low-hardware overhead mechanism called sampling based adaptive replacement (SBAR) is proposed, to dynamically choose between an MLp-aware and a traditional replacement policy, depending on which one is more effective at reducing the number of memory related stalls.