• Publications
  • Influence
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches
  • M. Qureshi, Y. Patt
  • Computer Science
  • 39th Annual IEEE/ACM International Symposium on…
  • 9 December 2006
This paper investigates the problem of partitioning a shared cache between multiple concurrently executing applications. The commonly used LRU policy implicitly partitions a shared cache on a demandExpand
Adaptive insertion policies for high performance caching
TLDR
A Dynamic Insertion Policy (DIP) is proposed to choose between BIP and the traditional LRU policy depending on which policy incurs fewer misses, and shows that DIP reduces the average MPKI of the baseline 1MB 16-way L2 cache by 21%, bridging two-thirds of the gap between LRU and OPT. Expand
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers
TLDR
Results show that feedback-directed prefetching eliminates the large negative performance impact incurred on some benchmarks due to prefetcher, and it is applicable to stream-based prefetchers, global-history-buffer based delta correlation prefetchery, and PC-based stridePrefetchers. Expand
Improving GPU performance via large warps and two-level warp scheduling
TLDR
This work proposes two independent ideas: the large warp microarchitecture and two-level warp scheduling that improve performance by 19.1% over traditional GPU cores for a wide variety of general purpose parallel applications that heretofore have not been able to fully exploit the available resources of the GPU chip. Expand
Alternative implementations of two-level adaptive branch prediction
TLDR
This work proposes a new dynamic branch predictor that achieves substantially higher accuracy than any other scheme reported in the literature, and measures the effectiveness of different prediction algorithms and different amounts of history and pattern information. Expand
The V-Way cache: demand-based associativity via global replacement
TLDR
The proposed variable-way, or V-Way, set-associative cache achieves an average miss rate reduction of 13% on sixteen benchmarks from the SPEC CPU2000 suite, which translates into an average IPC improvement of 8%. Expand
A comparison of dynamic branch predictors that use two levels of branch history
TLDR
This paper shows that there are really nine variations of the same basic model Two-Level Adaptive Branch Prediction; Pan, So, and Rahmeh call it Correlation Branch Prediction, and studies the effects of different branch history lengths and pattern history table configurations. Expand
Utility-Based Cache Partitioning
This paper investigates the problem of partitioning a shared cache between multiple concurrently executing applications. The commonly used LRU policy implicitly partitions a shared cache on a demandExpand
Runahead execution: an alternative to very large instruction windows for out-of-order processors
TLDR
This paper proposes runahead execution as an effective way to increase memory latency tolerance in an out-of-order processor without requiring an unreasonably large instruction window. Expand
Scheduling algorithms for modern disk drives
TLDR
This work examines the impact of complex logical-to-physical mappings and large prefetching caches on scheduling effectiveness and concludes that the cyclical scan algorithm, which always schedules requests in ascending logical order, achieves the highest performance among seek-reducing algorithms for such workloads. Expand
...
1
2
3
4
5
...