Soohong Ahn | Semantic Scholar

Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications

Computer Science, Engineering

1 January 2023

This work proposes a novel CXL-based memory disaggregation architecture with a real-world prototype demonstration, which overcomes the bandwidth limitation of the CXL interface using near-data processing.

IEEE

Accelerating Sparse Matrix-Matrix Multiplication on GPUs with Processing Near HBMs

Shiju LiYounghoon Min Jongryool Kim

Computer Science, Engineering

arXiv.org

12 December 2025

A novel custom near-memory processing approach to optimizing SpGEMM on GPU and the Acceleration of Indirect Memory Access (AIA) technique is presented, a novel custom near-memory processing approach to optimizing SpGEMM on GPU HBM that demonstrates significant performance improvements over state-of-the-art methods.

arXiv

StreamDQ: HBM-Integrated On-the-Fly DeQuantization via Memory Load for Large Language Models

Minki JeongDaegun Yoon Hoshik Kim

Computer Science, Engineering

IEEE computer architecture letters

1 July 2025

StreamDQ is proposed, a lightweight architectural enhancement for cloud-scale LLM inference that enables on-the-fly dequantization within the memory subsystem by integrating compact DeQuantization Blocks (DQBs) into the base-die of high-bandwidth memory (HBM).

IEEE

Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications

Joonseop SimSoohong Ahn Kyoung Park

Computer Science, Engineering

International Symposium on High-Performance…

2 March 2024

IEEE