A Case for Intelligent RAM: IRAM
@inproceedings{Patterson1997ACF, title={A Case for Intelligent RAM: IRAM}, author={David A. Patterson and Thomas E. Anderson and Neal Cardwell and Richard Fromm and K. Keeton and Christoforos E. Kozyrakis and R. Thomas and Katherine A. Yelick}, year={1997} }
Two trends call into question the current practice of microprocessors and DRAMs being fabricated as different chips on different fab lines: 1) the gap between processor and DRAM speed is growing at 50% per year; and 2) the size and organization of memory on a single DRAM chip is becoming awkward to use in a system, yet size is growing at 60% per year. Intelligent RAM, or IRAM, merges processing and memory into a single chip to lower memory latency, increase memory bandwidth, and improve energy…
Figures and Tables from this paper
207 Citations
Exploiting ILP in page-based intelligent memory
- Computer ScienceMICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture
- 1999
This study compares the speed, area, and power of different implementations of Active Pages, an intelligent memory system which helps bridge the growing gap between processor and memory performance by associating simple functions with each page of data and shows that instruction-level parallelism is the key to the previous success with reconfigurable logic.
ActiveOS: virtualizing intelligent memory
- Computer ScienceProceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040)
- 1999
The results indicate that paging and inter-chip communication can be scheduled to achieve high performance for applications that use Active Pages with minimal adverse effects to applications that only use conventional pages.
Performance implications of next generation PowerPC/sup TM/ microprocessor cache architectures
- Computer ScienceProceedings IEEE COMPCON 97. Digest of Papers
- 1997
The results of an analysis done at the Motorola Computer Group are presented to characterize the required properties of a cache hierarchy for a next-generation G3 PowerPC superscalar low-power microprocessor.
Active memory operations
- Computer ScienceICS '07
- 2007
Through simulation it is shown that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50X faster barriers, 12X faster spinlocks, 8.5X-15X faster stream/array operations, and 3X faster database queries.
Decoupled access DRAM architecture
- Computer ScienceProceedings Innovative Architecture for Future Generation High-Performance Processors and Systems
- 1997
This paper discusses an approach to reducing memory latency in future systems where a single chip DRAM/processor will not be feasible even in 10 years, e.g. systems requiring a large memory and/or many CPU's.
Empirical Study for Optimization of Power-Performance with On-Chip Memory
- Computer ScienceISHPC
- 2005
This study quantitatively examined the effectiveness of the on-chip memory in an SH-4 processor by directly measuring the real power of the processor and proposed an on- chip RAM architecture called SCIMA (software controllable integrated memory architecture) which enables DMA (direct memory access) transfer to theOn- chip memory.
The Energy Efficiency Of Iram Architectures
- Computer Science, EngineeringConference Proceedings. The 24th Annual International Symposium on Computer Architecture
- 1997
This work finds that IRAM memory hierarchies consume as little as 22% of the energy consumed by a conventional memory hierarchy for memory-intensive applications, while delivering comparable performance.
Operating Systems Techniques for Parallel Computation in Intelligent Memory
- Computer ScienceParallel Process. Lett.
- 2002
This study examines operating system techniques that allow Active Page memories to share, or multiplex, embedded VLIW processors across multiple physical Active Pages, and finds that hardware costs of computational logic can be reduced from 31% of DRAM chip area to 12%, through multiplexing, without significant loss in performance.
Efficient management of memory hierarchies in embedded DRAM systems
- Computer ScienceICS '99
- 1999
This paper shows how the embedded memory can be used to provide both data storage and a caching capability equivalent to a more complex processor device, and implies that designers of embedded and small-scale systems can achieve significant performance wins through the use of a combined processor/ memory device and the memory system design proposed.
The architecture of the DIVA processing-in-memory chip
- Computer ScienceICS '02
- 2002
The DIVA (Data IntensiVe Architecture) system incorporates a collection of Processing-In-Memory chips as smart-memory co-processors to a conventional microprocessor, and a PIM-based architecture with many such chips yields significantly higher performance than a multiprocessor of a similar scale and at a much reduced hardware cost.
References
SHOWING 1-10 OF 33 REFERENCES
Intelligent RAM (IRAM): chips that remember and compute
- Computer Science1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers
- 1997
IRAM is attractive because the gigabit DRAM chip has enough transistors for both a powerful processor and a memory big enough to contain whole programs and data sets, and it needs more metal layers to accelerate the long lines of 600mm/sup 2/ chips.
Missing the Memory Wall: The Case for Processor/Memory Integration
- Computer Science23rd Annual International Symposium on Computer Architecture (ISCA'96)
- 1996
It is shown that processor memory integration can be used to build competitive, scalable and cost-effective MP systems and results from execution driven uni- and multi-processor simulations show that the benefits of lower latency and higher bandwidth can compensate for the restrictions on the size and complexity of the integrated processor.
A four megabit Dynamic Systolic Associative Memory chip
- Computer ScienceJ. VLSI Signal Process.
- 1992
In a preliminary logic design of a (1024×4096) associative memory chip based on a 4 Mbit DRAM, the ∼6 transistor per sense amplifier in a DRAM are expanded by ∼9 transistors per sense amplifiers in the modified chip.
Combined DRAM and logic chip for massively parallel systems
- Computer ScienceProceedings Sixteenth Conference on Advanced Research in VLSI
- 1995
The basic chip technology and organization, some projections on the future of EXECUBE-like PIM chips, and finally some lessons to be learned as to why this technology should radically affect the way the authors ought think about computer architecture are overviewed.
The Energy Efficiency Of Iram Architectures
- Computer Science, EngineeringConference Proceedings. The 24th Annual International Symposium on Computer Architecture
- 1997
This work finds that IRAM memory hierarchies consume as little as 22% of the energy consumed by a conventional memory hierarchy for memory-intensive applications, while delivering comparable performance.
Parallel processing RAM chip with 256 Mb DRAM and quad processors
- Computer Science1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers
- 1997
Parallel processing RAM (PPRAM) is an architectural framework for merged memory/logic application-specific standard products (ASSPs) that integrates onto a single chip a large amount of DRAM and a common network interface based on a common communication protocol.
Computational Ram: A Memory-simd Hybrid And Its Application To Dsp
- Computer Science1992 Proceedings of the IEEE Custom Integrated Circuits Conference
- 1992
This paper describes the CoRAM architecture, a working 8Kbit prototype, a full scale CoRAM designed in a 4Mbit DRAM process, and CoRAM applications.
A multimedia 32 b RISC microprocessor with 16 Mb DRAM
- Computer Science1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC
- 1996
This 32 b microprocessor with on-chip 2 MB DRAM is for multimedia applications that require a low-power embedded microprocessor and large memory and integrates 17 M transistors in 19.7 mm/sup 2/.
A 7.68 GIPS 3.84 GB/s 1W parallel image processing RAM integrating a 16 Mb DRAM and 128 processors
- Computer Science1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC
- 1996
Large memory capacity and high-data-rate random access achieved by these techniques make the PIP-RAM suitable for image processing of large-scale, full-color pictures.
A 1 MB, 100 MHz integrated L2 cache memory with 128b interface and ECC protection
- Engineering1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC
- 1996
The advent of 20 ns, 16 Mb DRAM technology has made a high-speed single-chip 1MB cache possible, replacing multiple SRAM and logic modules, saving board space and reducing power.