A case for two-way skewed-associative caches
Two-way skewed associative caches represent the best tradeoff for today microprocessors with on-chip caches whose sizes are in the range of 4-8K bytes. Expand
The L-TAGE Branch Predictor
Zero-content augmented caches
On applications manipulating large amount of null data blocks, such a ZC cache allows to significantly reduce the miss rate and memory traffic, and therefore to increase performance for a small hardware overhead. Expand
A new case for the TAGE branch predictor
  • André Seznec
  • Computer Science
  • 44th Annual IEEE/ACM International Symposium on…
  • 3 December 2011
The TAGE predictor is often considered as state-of-the-art in conditional branch predictors proposed by academy but how to further reduce the misprediction rate of TAGE through augmenting it with small side predictors is shown. Expand
Design tradeoffs for the alpha EV8 conditional branch predictor
This paper presents the Alpha EV8 conditional branch predictor. The Alpha EV8 microprocessor project, canceled in June 2001 in a late phase of development, envisioned an aggressive 8-wide issueExpand
Practical data value speculation for future high-end processors
A new value predictor VTAGE harnessing the global branch history is introduced, which can seamlessly predict back-to-back occurrences, allowing predictions to span over several cycles, and achieves higher performance than previously proposed context-based predictors. Expand
Choosing representative slices of program execution for microarchitecture simulations: a preliminary
This chapter proposes a technique to choose a few program execution slices representative of the entire execution, characterize the behavior of each consecutive slice executed, and uses a statistical classification method to discriminate the execution slices and select the representative ones. Expand
Tarantula: a vector extension to the alpha architecture
Tarantula is an aggressive floating point machine targeted at technical, scientific and bioinformatics workloads that fully integrates into a virtual-memory cache-coherent system without changes to its coherency protocol, and achieves excellent "real-computation" per transistor and per watt ratios. Expand
Practical and secure PCM systems by online detection of malicious write streams
A practical wear-leveling framework that can provide years of lifetime under attacks while still incurring negligible (<1%) write overhead for typical applications is proposed. Expand