• Publications
  • Influence
Clearing the clouds: a study of emerging scale-out workloads on modern hardware
TLDR
This work identifies the key micro-architectural needs of scale-out workloads, calling for a change in the trajectory of server processors that would lead to improved computational density and power efficiency in data centers. Expand
Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories
TLDR
Results indicate that gated-V<subscrpt>dd</subscRpt> together with a novel resizable cache architecture reduces energy-delay by 62% with minimal impact on performance. Expand
Reactive NUCA: near-optimal block placement and replication in distributed caches
TLDR
Reactive NUCA (R-NUCA), a distributed cache design which reacts to the class of each cache access and places blocks at the appropriate location in the cache, is proposed. Expand
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Current software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions fromExpand
Spatial Memory Streaming
TLDR
Using cycle-accurate full-system multiprocessor simulation of commercial and scientific applications, it is demonstrated that spatial memory streaming can on average predict 58% of LI and 65% of off-chip misses, for a mean performance improvement of 37% and at best 307%. Expand
SimFlex: Statistical Sampling of Computer System Simulation
TLDR
Statistical sampling makes simulation-based studies feasible by providing ten-thousand-fold reductions in simulation runtime and enabling thousand-way simulation parallelism. Expand
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
TLDR
The Sampling Microarchitecture Simulation (SMARTS) framework is presented as an approach to enable fast and accurate performance measurements of full-length benchmarks and accelerates simulation by selectively measuring in detail only an appropriate benchmark subset. Expand
Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding
TLDR
Two-dimensional (2D) error coding in embedded memories is proposed, a scalable multi-bit error protection technique to improve memory reliability and yield and it is shown that 2D error coding can correct clustered errors up to 32times32 bits with significantly smaller performance, area, and power overheads than conventional techniques. Expand
Dead-block prediction & dead-block correlating prefetchers
TLDR
The Dead-Block Predictors (DBPs) are proposed, trace-based predictors that accurately identify “when” an Ll data cache block becomes evictable or “dead”, and a DBCP enables effective data prefetching in a wide spectrum of pointer-intensive, integer, and floating-point applications. Expand
Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache
TLDR
This paper introduces Footprint Cache, an efficient die-stacked DRAM cache design for server processors that eliminates the excessive off-chip traffic associated with page-based designs, while preserving their high hit ratio, small tag array overhead, and low lookup latency. Expand
...
1
2
3
4
5
...