Ohm-GPU: Integrating New Optical Network and Heterogeneous Memory into GPU Multi-Processors

  title={Ohm-GPU: Integrating New Optical Network and Heterogeneous Memory into GPU Multi-Processors},
  author={Jie Zhang and Myoungsoo Jung},
  journal={MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture},
  • Jie Zhang, Myoungsoo Jung
  • Published 12 September 2021
  • Computer Science
  • MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
Traditional graphics processing units (GPUs) suffer from the low memory capacity and demand for high memory bandwidth. To address these challenges, we propose Ohm-GPU, a new optical network based heterogeneous memory design for GPUs. Specifically, Ohm-GPU can expand the memory capacity by combing a set of high-density 3D XPoint and DRAM modules as heterogeneous memory. To prevent memory channels from throttling throughput of GPU memory system, Ohm-GPU replaces the electrical lanes in the… Expand


ZnG: Architecting GPU Multi-Processors with New Flash for Scalable Data Analysis
  • J. Zhang, Myoungsoo Jung
  • Computer Science
  • 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)
  • 2020
ZnG replaces all GPU internal DRAMs with an ultra-low-latency SSD to maximize the GPU memory capacity and removes performance bottleneck of the SSD by replacing its flash channels with a high-throughput flash network and integrating SSD firmware in the GPU’s MMU to reap the benefits of hardware accelerations. Expand
NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures
NVMMU unifies two discrete software stacks (one for the SSD and other for the GPU) in two major ways and can eliminate unnecessary user/kernel-mode switching, improve memory management, and remove data copy overheads. Expand
Leveraging Silicon-Photonic NoC for Designing Scalable GPUs
This paper advocates using silicon-photonic link technology for on-chip communication in GPUs, and presents the first GPU-specific analysis of a cost-effective hybrid photonic crossbar NoC. Expand
Integrating nanophotonics in GPU microarchitecture
This work employs silicon nanophotonics and 3D stacking technologies in GPU microarchitecture to provide higher communication bandwidth and lower latency signaling mechanisms at reduced power and proposes a novel interconnect aware thread scheduling scheme to alleviate the traffic congestion. Expand
Performance characterization of a DRAM-NVM hybrid memory architecture for HPC applications using intel optane DC persistent memory modules
It is found that Optane-only executions are slower in terms of execution time than DRAM-only and Memory-mode executions by a minimum of 2 to 16% for VPIC and maximum of 6x for LULESH, which means HPC mini-apps can now scale up the their problem size given such a memory system. Expand
Exploring Silicon Nanophotonics in Throughput Architecture
This work proposes silicon nanophotonics and 3-D stacking technologies in throughput architecture that provides higher communication bandwidth and lower latency signaling mechanisms at reduced power and anticipates that for emerging workloads and microarchitectures the implications are far reaching. Expand
DUANG: Fast and lightweight page migration in asymmetric memory systems
A novel resistive memory architecture sharing a set of row buffers between a pair of neighboring banks that enables two attractive techniques: migrating memory pages between slow and fast banks with little performance overhead and adaptively allocating more row buffers to busier banks based on memory access patterns. Expand
OCDIMM: Scaling the DRAM Memory Wall Using WDM Based Optical Interconnects
OCCIMM (optically connected DIMM), a CPU-DRAM interface that takes advantage of multiwavelength optical interconnects, has at least three key benefits when compared to alternatives such as FBDIMM, which is used in products from Sun and Intel. Expand
Re-architecting DRAM memory systems with monolithically integrated silicon photonics
This work redesigns the DRAM main memory system using a proposed monolithically integrated silicon photonics technology and shows that the photonically interconnected DRAM (PIDRAM) provides a promising solution to all of these issues. Expand
FlashGPU: Placing New Flash Next to GPU Cores
A dynamic page-placement and buffer manager in Z-NAND subsystems is proposed by being aware of bulk and parallel memory access characteristics of GPU applications, thereby offering high-throughput and low-energy consumption behaviors. Expand