3D-Stacked Memory Architectures for Multi-core Processors

@article{Loh20083DStackedMA,
  title={3D-Stacked Memory Architectures for Multi-core Processors},
  author={Gabriel H. Loh},
  journal={2008 International Symposium on Computer Architecture},
  year={2008},
  pages={453-464}
}
  • Gabriel H. Loh
  • Published 1 June 2008
  • Computer Science
  • 2008 International Symposium on Computer Architecture
Three-dimensional integration enables stacking memory directly on top of a microprocessor, thereby significantly reducing wire delay between the two. [] Key Result Our simulation results show that with a few simple changes to the 3D-DRAM organization, we can achieve a 1.75x speedup over previously proposed 3D-DRAM approaches on our memory-intensive multi-programmed workloads on a quad-core processor. The significant increase in memory system performance makes the L2 miss handling architecture (MHA) a new…

Figures and Tables from this paper

Near Data Processing: Impact and Optimization of 3D Memory System Architecture on the Uncore
TLDR
To reduce the latency and traffic on the network, this paper proposes restructuring the memory hierarchy to a memory-side cache organization and also explores the effects of various address translations and OS page allocation strategies.
Improving VLIW Processor Performance Using Three-Dimensional (3D) DRAM Stacking
  • Yangyang Pan, Tong Zhang
  • Computer Science
    2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors
  • 2009
TLDR
From the simulation results, it is found 3D stacking DRAM main memory can improve the system performance by 10%~80% than 2D off-chip DRAMmain memory depending on different benchmarks.
3D LOGIC-MEMORY INTEGRATION FOR HIGH PERFORMANCE EMBEDDED SYSTEM AND RECONFIGURABLE COMPUTING
TLDR
It is shown that such DRAM-based FPGAs can largely reduce the FPGA footprint, which can further translate into significant speed and energy efficiency improvement, compared to SRAM- based FPG as.
Energy-Efficient Monolithic Three-Dimensional On-Chip Memory Architectures
  • Y. Yu, N. Jha
  • Computer Science
    IEEE Transactions on Nanotechnology
  • 2018
TLDR
An efficient memory interface for monolithic 3D-stacked RAM (both DRAM and NVRAMs such as resistive RAM and nanotube RAM) is presented, which takes advantage of the tremendous bandwidth made available by MIVs to implement an on-chip memory bus in order to hide the latency of large data transfers.
Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy
  • Gabriel H. Loh
  • Computer Science
    2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
  • 2009
TLDR
This work proposes a cache where each set is organized as multiple logical FIFO or queue structures that simultaneously provide performance isolation between threads as well as reduce the number of entries occupied by dead lines.
An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth
TLDR
This paper contests that the memory hierarchy, including the L2 cache and DRAM interface, needs to be re-architected so that it can take full advantage of this massive bandwidth, and proposes an efficient mechanism to manage the false sharing problem when implementing SMART-3D in a multi-socket system.
A high-performance multiported L2 memory IP for scalable three-dimensional integration
TLDR
A scalable 3D nonuniform memory access (NUMA) architecture, based on low latency logarithmic interconnects, which allows stacking of multiple memory layers with identical dies, supports multiple outstanding transactions, and achieves high clock frequencies due to its highly pipelined nature is proposed.
Asymmetric DRAM synthesis for heterogeneous chip multiprocessors in 3D-stacked architecture
TLDR
This work proposes an asymmetric 3D-stacked DRAM architecture where the DRAM die is divided into multiple segments and the segments are optimized for different memory requirements, which can be different for different heterogeneous CMPs.
Dynamic bandwidth scaling for embedded DSPs with 3D-stacked DRAM and wide I/Os
  • D. W. Chang, Y. Son, N. Kim
  • Computer Science
    2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
  • 2013
TLDR
This paper analyzes memory latency reduction opportunities in a 3D main memory system with Wide I/O by taking better advantage of 3D integration technology and quantifies their benefit, and proposes to dynamically scale memory bandwidth at runtime based on an application's program phases.
An efficient distributed memory interface for many-core platform with 3D stacked DRAM
  • Igor Loi, L. Benini
  • Computer Science
    2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)
  • 2010
TLDR
This paper presents an efficient and flexible distributed memory interface for 3D-stacked DRAM that ensures ultra-low-latency access to the memory modules on top of each processing element (vertically local memory neighborhoods) and takes full advantage of the lower latency of vertical interconnect.
...
...

References

SHOWING 1-10 OF 57 REFERENCES
Bridging the processor-memory performance gap with 3D IC technology
TLDR
It is shown that reducing memory latency by bringing main memory on chip gives near-perfect performance, and three-dimensional IC technology can provide the much needed bandwidth without the cost, design complexity, and power issues associated with a large number of off-chip pins.
Die Stacking (3D) Microarchitecture
  • B. Black, M. Annavaram, C. Webb
  • Engineering
    2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)
  • 2006
TLDR
This research study the performance advantages and thermal challenges of two forms of die stacking: Stacking a large DRAM or SRAM cache on a microprocessor and dividing a traditional micro architecture between two die in a stack.
A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy
TLDR
This work is the first attempt to study the performance benefits of 3D technology under the influence of thermal constraints, and it is shown that the 3D system registers large performance improvement for memory intensive applications.
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
TLDR
It is shown how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing and that a PicoServer performs comparably to a Pentium 4-like class machine while consuming only about 1/10 of the power.
Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors
TLDR
This work proposes a family of thermal herding techniques that reduces 3D power density and locates a majority of the power on the top die closest to the heat sink, which results in a 47.0% performance improvement and a 20% reduction in total power.
A performance comparison of contemporary DRAM architectures
TLDR
A simulation-based performance study of a representative group of small-system organizations, each evaluated in a small system organization, reveals that current advanced DRAM technologies are attacking the memory bandwidth problem but not the latency problem.
Memory access scheduling
TLDR
This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure.
Scalable Cache Miss Handling for High Memory-Level Parallelism
TLDR
This paper presents a novel scalable MHA design for high-MLP processors, which is hierarchical, with a small MSHR file per cache bank, and a larger MS HR file shared by all banks, and uses a Bloom filter to reduce searches in the largerMSHR file.
Leveraging 3D Technology for Improved Reliability
TLDR
The possibility of providing redundancy with an older process technology, an unexplored and especially compelling application of die heterogeneity, is evaluated and it is shown that the overhead of the second die can be reduced to a 3degC temperature increase or a 4% performance loss, while also providing higher error resilience.
Introspective 3D chips
TLDR
It is shown that hardware stubs could be inserted into commodity processors at design time that would allow analysis layers to be bonded to development chips, and that these stubs would increase area and power by no more than 0.021mm2 and 0.9% respectively.
...
...