A Modern Primer on Processing in Memory
@article{Mutlu2020AMP, title={A Modern Primer on Processing in Memory}, author={Onur Mutlu and Saugata Ghose and Juan G'omez-Luna and Rachata Ausavarungnirun}, journal={ArXiv}, year={2020}, volume={abs/2012.03112} }
Modern computing systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in computing that cause performance, scalability and energy bottlenecks: (1) data access is a key bottleneck as many important applications are increasingly data-intensive, and memory bandwidth and energy do not scale well, (2) energy consumption is a key limiter in almost all computing platforms, especially server and mobile systems, (3) data…
Figures and Tables from this paper
64 Citations
PIM-Enclave: Bringing Confidential Computation Inside Memory
- Computer ScienceArXiv
- 2021
A novel design for Processing-In-Memory (PIM) as a dataintensive workload accelerator for confidential computing that can provide a side-channel resistant secure computation offloading and run data-intensive applications with negligible performance overhead compared to baseline PIM model is presented.
PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM
- Computer ScienceACM Trans. Archit. Code Optim.
- 2023
PiDRAM is designed and developed, the first flexible end-to-end framework that enables system integration studies and evaluation of real, commodity DRAM-based PuM techniques, and describes how to solve key integration challenges to make such techniques work and be effective on a real-system prototype.
Accelerating Neural Network Inference With Processing-in-DRAM: From the Edge to the Cloud
- Computer ScienceIEEE Micro
- 2022
The analysis reveals that PIM greatly benefits memory-bound NNs, and concludes that the ideal PIM architecture for NN models depends on a model's distinct attributes, due to the inherent architectural design choices.
Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture
- Computer ScienceArXiv
- 2021
This paper provides the first comprehensive analysis of the first publicly-available real-world PIM architecture, and presents PrIM ( Processing-In-Memory benchmarks), a benchmark suite of 16 workloads from different application domains, which are identified as memory-bound.
Casper: Accelerating Stencil Computations Using Near-Cache Processing
- Computer ScienceIEEE Access
- 2023
Casper is a near-cache accelerator consisting of specialized stencil computation units connected to the last-level cache (LLC) of a traditional CPU, based on two key ideas: avoiding the cost of moving rarely reused data throughout the cache hierarchy, and exploiting the regularity of the data accesses and the inherent parallelism of stencil computations to increase overall performance.
DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks
- Computer ScienceIEEE Access
- 2021
This research presents a probabilistic analysis of the response of the immune system to natural disasters to the presence of carbon dioxide in the environment.
Intelligent Architectures for Intelligent Computing Systems
- Computer Science2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)
- 2021
This invited special session talk describes three major shortcomings of modern architectures in terms of 1) dealing with data, 2) taking advantage of the vast amounts of data, and 3) exploiting different semantic properties of application data.
SIMDRAM: a framework for bit-serial SIMD processing using DRAM
- Computer ScienceASPLOS
- 2021
This paper proposes SIMDRAM, a flexible general-purpose processing-using-DRAM framework that enables the efficient implementation of complex operations, and provides a flexible mechanism tosupport the implementation of arbitrary user-defined operations.
CHARM: Composing Heterogeneous AcceleRators for Matrix Multiply on Versal ACAP Architecture
- Computer ScienceFPGA
- 2023
This work identifies the biggest system throughput bottleneck resulting from the mismatch of massive computation resources of one monolithic accelerator and the various MM layers of small sizes in the application, and proposes the CHARM framework to compose multiple diverse MM accelerator architectures working concurrently towards different layers within one application.
Fundamentally Understanding and Solving RowHammer
- Computer Science2023 28th Asia and South Pacific Design Automation Conference (ASP-DAC)
- 2023
Two major directions are argued for to amplify research and development efforts in building a much deeper understanding of the RowHammer problem and its many dimensions, in both cutting-edge DRAM chips and computing systems deployed in the field, and the design and development of extremely efficient and fully-secure solutions via system-memory cooperation.
References
SHOWING 1-10 OF 455 REFERENCES
D-RaNGe: Using Commodity DRAM Devices to Generate True Random Numbers with Low Latency and High Throughput
- Computer Science2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)
- 2019
D-RanGe is a methodology for extracting true random numbers from commodity DRAM devices with high throughput and low latency by deliberately violating the read access timing parameters and is evaluated using the commonly-used NIST statistical test suite for randomness.
The DRAM Latency PUF: Quickly Evaluating Physical Unclonable Functions by Exploiting the Latency-Reliability Tradeoff in Modern Commodity DRAM Devices
- Computer Science2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)
- 2018
The DRAM latency PUF is introduced, a new class of fast, reliable DRAM PUFs that satisfy runtime-accessible PUF requirements and are quickly generated irrespective of operating temperature using a real system with no additional hardware modications.
Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM
- Computer Science2016 IEEE International Symposium on High Performance Computer Architecture (HPCA)
- 2016
A new DRAM substrate, Low-Cost Inter-Linked Subarrays (LISA), whose goal is to enable fast and efficient data movement across a large range of memory at low cost, and whose combined benefit is higher than the benefit of each alone, on a variety of workloads and system configurations.
PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture
- Computer Science2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)
- 2015
A new PIM architecture is proposed that does not change the existing sequential programming models and automatically decides whether to execute PIM operations in memory or processors depending on the locality of data, and combines the best parts of conventional and PlM architectures by adapting to data locality of applications.
ComputeDRAM: In-Memory Compute Using Off-the-Shelf DRAMs
- Computer ScienceMICRO
- 2019
This work is the first work to demonstrate in-memory computation with off-the-shelf, unmodified, commercial, DRAM, by violating the nominal timing specification and activating multiple rows in rapid succession, which happens to leave multiple rows open simultaneously, thereby enabling bit-line charge sharing.
MAGIC—Memristor-Aided Logic
- ChemistryIEEE Transactions on Circuits and Systems II: Express Briefs
- 2014
In this brief, a memristor-only logic family, i.e., memristar-aided logic (MAGIC), is presented, and in each MAGIC logic gate, memristors serve as an input with previously stored data, and an additional Memristor serves as an output.
Memristor-based IMPLY logic design procedure
- Chemistry2011 IEEE 29th International Conference on Computer Design (ICCD)
- 2011
The design and behavior of a memristive-based logic gate - an IMPLY gate - are presented and design issues such as the tradeoff between speed (fast write times) and correct logic behavior are described, as part of an overall design methodology.
Memristor-Based Material Implication (IMPLY) Logic: Design Principles and Methodologies
- Computer ScienceIEEE Transactions on Very Large Scale Integration (VLSI) Systems
- 2014
The IMPLY logic gate, a memristor-based logic circuit, is described and a methodology for designing this logic family is proposed, based on a general design flow suitable for all deterministic memristive logic families.
Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost
- TACO
- 2016
Exploiting Near-Data Processing to Accelerate Time Series Analysis
- Computer Science2022 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)
- 2022
A time series is a chronologically ordered set of samples of a real-valued variable that can contain millions of observations. Time series analysis is used to analyze information in a wide variety of…