Share This Author
Architecting phase change memory as a scalable dram alternative
This work proposes, crafted from a fundamental understanding of PCM technology parameters, area-neutral architectural enhancements that address these limitations and make PCM competitive with DRAM.
Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors
- Yoongu Kim, Ross Daly, O. Mutlu
- Computer ScienceACM/IEEE 41st International Symposium on Computer…
- 16 October 2014
This paper exposes the vulnerability of commodity DRAM chips to disturbance errors, and shows that it is possible to corrupt data in nearby addresses by reading from the same address in DRAM by activating the same row inDRAM.
Base-delta-immediate compression: Practical data compression for on-chip caches
- Gennady Pekhimenko, Vivek Seshadri, O. Mutlu, Phillip B. Gibbons, M. Kozuch, T. Mowry
- Computer Science21st International Conference on Parallel…
- 19 September 2012
There is a need for a simple yet efficient compression technique that can effectively compress common in-cache data patterns, and has minimal effect on cache access latency.
RAIDR: Retention-aware intelligent DRAM refresh
- Jamie Liu, Ben Jaiyen, R. Veras, O. Mutlu
- Computer Science39th Annual International Symposium on Computer…
- 9 June 2012
This paper proposes RAIDR (Retention-Aware Intelligent DRAM Refresh), a low-cost mechanism that can identify and skip unnecessary refreshes using knowledge of cell retention times and group DRAM rows into retention time bins and apply a different refresh rate to each bin.
A scalable processing-in-memory accelerator for parallel graph processing
- Junwhan Ahn, Sungpack Hong, S. Yoo, O. Mutlu, Kiyoung Choi
- Computer ScienceACM/IEEE 42nd Annual International Symposium on…
- 13 June 2015
This work argues that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve memory-capacity-proportional performance and designs a programmable PIM accelerator for large-scale graph processing called Tesseract.
Ramulator: A Fast and Extensible DRAM Simulator
This paper presents Ramulator, a fast and cycle-accurate DRAM simulator that is built from the ground up for extensibility, and is able to provide out-of-the-box support for a wide array of DRAM standards.
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
A parallelism-aware batch scheduler that seamlessly incorporates support for system-level thread priorities and can provide different service levels, including purely opportunistic service, to threads with different priorities, and is also simpler to implement than STFM.
Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology
- Vivek Seshadri, Donghyuk Lee, T. Mowry
- Computer Science50th Annual IEEE/ACM International Symposium on…
- 14 October 2017
Ambit is proposed, an Accelerator-in-Memory for bulk bitwise operations that largely exploits existing DRAM structure, and hence incurs low cost on top of commodity DRAM designs (1% of DRAM chip area).
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior
- Yoongu Kim, Michael Papamichael, O. Mutlu, Mor Harchol-Balter
- Computer Science43rd Annual IEEE/ACM International Symposium on…
- 4 December 2010
This paper presents a new memory scheduling algorithm that addresses system throughput and fairness separately with the goal of achieving the best of both, and evaluates TCM on a wide variety of multiprogrammed workloads and compares its performance to four previously proposed scheduling algorithms, finding that TCM achieves both the best system throughputand fairness.
A case for bufferless routing in on-chip networks
A case is made for a new approach to designing on-chip interconnection networks that eliminates the need for buffers for routing or flow control and new algorithms for routing without using buffers in router input/output ports are described.