Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications
- Joonseop SimSoohong Ahn Kyoung Park
- 1 January 2023
Computer Science, Engineering
This work proposes a novel CXL-based memory disaggregation architecture with a real-world prototype demonstration, which overcomes the bandwidth limitation of the CXL interface using near-data processing.
Accelerating Sparse Matrix-Matrix Multiplication on GPUs with Processing Near HBMs
- Shiju LiYounghoon Min Jongryool Kim
- 12 December 2025
Computer Science, Engineering
A novel custom near-memory processing approach to optimizing SpGEMM on GPU and the Acceleration of Indirect Memory Access (AIA) technique is presented, a novel custom near-memory processing approach to optimizing SpGEMM on GPU HBM that demonstrates significant performance improvements over state-of-the-art methods.
StreamDQ: HBM-Integrated On-the-Fly DeQuantization via Memory Load for Large Language Models
- Minki JeongDaegun Yoon Hoshik Kim
- 1 July 2025
Computer Science, Engineering
StreamDQ is proposed, a lightweight architectural enhancement for cloud-scale LLM inference that enables on-the-fly dequantization within the memory subsystem by integrating compact DeQuantization Blocks (DQBs) into the base-die of high-bandwidth memory (HBM).
Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications
- Joonseop SimSoohong Ahn Kyoung Park
- 2 March 2024
Computer Science, Engineering
This work proposes a novel CXL-based memory disaggregation architecture with a real-world prototype demonstration, which overcomes the bandwidth limitation of the CXL interface using near-data processing.