Corpus ID: 9604024

Exploring the Design Space of DRAM Caches

@inproceedings{Hicks2014ExploringTD,
  title={Exploring the Design Space of DRAM Caches},
  author={Matthew Hicks},
  year={2014}
}
Die-stacked DRAM caches represent an emerging technology that offers a new level of cache between SRAM caches and main memory. As compared to SRAM, DRAM caches offer high capacity and bandwidth but incur high access latency costs. Therefore, DRAM caches face new design considerations that include the placement and granularity of tag storage in either DRAM or SRAM. The associativity of the cache and the inherent behavior and constraints of DRAM are also factors to consider in the design of DRAM… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 17 REFERENCES
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches
  • Gabriel H. Loh, M. Hill
  • Computer Science
  • 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
  • 2011
Die-stacking technology enables multiple layers of DRAM to be integrated with multicore processors. A promising use of stacked DRAM is as a cache, since its capacity is insufficient to be all of mainExpand
Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design
TLDR
This paper proposes a latency-optimized cache architecture, called Alloy Cache, that eliminates the delay due to tag serialization by streaming tag and data together in a single burst, and proposes a simple and highly effective Memory Access Predictor. Expand
Supporting Very Large DRAM Caches with Compound-Access Scheduling and MissMap
This work efficiently enables conventional block sizes for very large die-stacked DRAM caches with two innovations: it makes hits faster with compound-access scheduling and misses faster with aExpand
Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache
TLDR
This paper introduces Footprint Cache, an efficient die-stacked DRAM cache design for server processors that eliminates the excessive off-chip traffic associated with page-based designs, while preserving their high hit ratio, small tag array overhead, and low lookup latency. Expand
Reducing DRAM latencies with an integrated memory hierarchy design
TLDR
It is shown that even with an aggressive, next-generation memory system using four Direct Rambus channels and an integrated one-megabyte level-two cache, a processor still spends over half of its time stalling for L2 misses. Expand
A performance comparison of contemporary DRAM architectures
TLDR
A simulation-based performance study of a representative group of small-system organizations, each evaluated in a small system organization, reveals that current advanced DRAM technologies are attacking the memory bandwidth problem but not the latency problem. Expand
DRAMSim2: A Cycle Accurate Memory System Simulator
TLDR
The process of validating DRAMSim2 timing against manufacturer Verilog models in an effort to prove the accuracy of simulation results is described. Expand
3D-Stacked Memory Architectures for Multi-core Processors
  • Gabriel H. Loh
  • Computer Science
  • 2008 International Symposium on Computer Architecture
  • 2008
TLDR
This work explores more aggressive 3D DRAM organizations that make better use of the additional die-to-die bandwidth provided by 3D stacking, as well as the additional transistor count, to achieve a 1.75x speedup over previously proposed 3D-DRAM approaches on memory-intensive multi-programmed workloads on a quad-core processor. Expand
An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth
TLDR
This paper contests that the memory hierarchy, including the L2 cache and DRAM interface, needs to be re-architected so that it can take full advantage of this massive bandwidth, and proposes an efficient mechanism to manage the false sharing problem when implementing SMART-3D in a multi-socket system. Expand
Two fast and high-associativity cache schemes
TLDR
Two schemes for implementing associativity greater than two are proposed, which are an extension of the column-associative cache and the parallel multicolumn cache, which can effectively reduce the average access time. Expand
...
1
2
...