A 1.2 V 20 nm 307 GB/s HBM DRAM With At-Speed Wafer-Level IO Test Scheme and Adaptive Refresh Considering Temperature Distribution

  title={A 1.2 V 20 nm 307 GB/s HBM DRAM With At-Speed Wafer-Level IO Test Scheme and Adaptive Refresh Considering Temperature Distribution},
  author={Kyomin Sohn and Won-Joo Yun and Reum Oh and Chi Sung Oh and Seong-Young Seo and Min-Sang Park and Dong-Hak Shin and Won-Chang Jung and Sang-Hoon Shin and Je-Min Ryu and Hye-Seung Yu and Jae-Hun Jung and Hyunui Lee and Seok-Yong Kang and Young-Soo Sohn and Jung Hwan Choi and Yong-Cheol Bae and Seong-Jin Jang and G. Y. Jin},
  journal={IEEE Journal of Solid-State Circuits},
A 1.2 V 20 nm 307 GB/s high-bandwidth memory (HBM) DRAM is presented to satisfy a high-bandwidth requirement of high-performance computing application. The HBM is composed of buffer die and multiple core dies, and each core die has 8 Gb DRAM cell array with additional 1 Gb ECC array. At-speed wafer level, a u-bump IO test scheme and an adaptive refresh scheme considering temperature distribution are proposed to guarantee test coverage and stable operation in a power-efficient manner. 
iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture
  • P. Gu, Xinfeng Xie, +4 authors Yuan Xie
  • Computer Science
    2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA)
  • 2020
This work proposes iPIM, the first programmable in-memory image processing accelerator using near-bank architecture, and proposes the SIMB (Single-Instruction-Multiple-Bank) ISA to enable flexible control flow and data access and develops iPIM-aware compiler optimizations to improve performance.
A 96-MB 3D-Stacked SRAM Using Inductive Coupling With 0.4-V Transmitter, Termination Scheme and 12:1 SerDes in 40-nm CMOS
Low-power, large-capacity, 3-cycle latency 3D-stacked SRAM for a DNN accelerator is achieved with the combination of the scaling of the inductive coupling technology and logic process and achieves more than 50% lower energy consumption.
Accelerating Monte Carlo Transport in the Trade-off of Performance and Power
Random simulation for particle transport theory is the main method for solving particle transport questions, which is widely used in medicine and computational physics. In this work, we present a
MPU: Towards Bandwidth-abundant SIMT Processor via Near-bank Computing
This work proposes MPU (Memory-centric Processing Unit), the first SIMT processor based on 3D-stacking near-bank computing architecture that adopts a hybrid pipeline with the capability of offloading instructions to near- bank compute-logic and develops a backend optimization for the instruction offloading decision.
DLUX: A LUT-Based Near-Bank Accelerator for Data Center Deep Learning Training Workloads
  • P. Gu, Xinfeng Xie, +4 authors Yuan Xie
  • Computer Science
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
  • 2021
DLUX, a high performance and energy-efficient 3D-PIM accelerator for DNN training using the near-bank architecture, is proposed and a small scratchpad buffer together with a lightweight transformation engine is proposed to exploit the locality and enable flexible data layout without the expensive cache.
Genome Sequence Alignment - Design Space Exploration for Optimal Performance and Energy Architectures
This work proposes an architecture based on ARMv8 cores and demonstrates that 16 ARM v8 64-bit OoO cores with HBM2 outperforms 32-cores of Intel Xeon Phi Knights Landing (KNL) processor with 3D stacked memory.
A Classification of Memory-Centric Computing
A comprehensive classification of memory-centric computing architectures is presented, based on three metrics: computation location, level of parallelism, and used memory technology, which unifies the terminology that uniquely identifies these architectures and highlights the potential future architectures that can be further explored.
An Area-Efficient and Wide-Range Inter-Signal Skew Compensation Scheme With the Embedded Bypass Control Register Operating as a Binary Search Algorithm for DRAM Applications
The adoption of the proposed bypass control register that operates with a binary search algorithm, such as the successive approximation register (SAR), allows the digital control delay line (DCDL) controller to be embedded in the delay line.
Gem5-X: A Gem5-Based System Level Simulation Framework to Optimize Many-Core Platforms
Gem5-X, a gem5-based system level simulation framework, and a methodology to optimize many-core systems for performance and power and the potential benefits of architectural extensions such as in-cache computing and 3D stacked High Bandwidth Memory are presented.