• Corpus ID: 18203189

A Case for Intelligent RAM: IRAM

@inproceedings{Patterson1997ACF,
  title={A Case for Intelligent RAM: IRAM},
  author={David A. Patterson and Thomas E. Anderson and Neal Cardwell and Richard Fromm and K. Keeton and Christoforos E. Kozyrakis and R. Thomas and Katherine A. Yelick},
  year={1997}
}
Two trends call into question the current practice of microprocessors and DRAMs being fabricated as different chips on different fab lines: 1) the gap between processor and DRAM speed is growing at 50% per year; and 2) the size and organization of memory on a single DRAM chip is becoming awkward to use in a system, yet size is growing at 60% per year. Intelligent RAM, or IRAM, merges processing and memory into a single chip to lower memory latency, increase memory bandwidth, and improve energy… 

Figures and Tables from this paper

Exploiting ILP in page-based intelligent memory
TLDR
This study compares the speed, area, and power of different implementations of Active Pages, an intelligent memory system which helps bridge the growing gap between processor and memory performance by associating simple functions with each page of data and shows that instruction-level parallelism is the key to the previous success with reconfigurable logic.
ActiveOS: virtualizing intelligent memory
  • M. Oskin, F. Chong, T. Sherwood
  • Computer Science
    Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040)
  • 1999
TLDR
The results indicate that paging and inter-chip communication can be scheduled to achieve high performance for applications that use Active Pages with minimal adverse effects to applications that only use conventional pages.
Performance implications of next generation PowerPC/sup TM/ microprocessor cache architectures
  • J. Reinold
  • Computer Science
    Proceedings IEEE COMPCON 97. Digest of Papers
  • 1997
TLDR
The results of an analysis done at the Motorola Computer Group are presented to characterize the required properties of a cache hierarchy for a next-generation G3 PowerPC superscalar low-power microprocessor.
Active memory operations
TLDR
Through simulation it is shown that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50X faster barriers, 12X faster spinlocks, 8.5X-15X faster stream/array operations, and 3X faster database queries.
Decoupled access DRAM architecture
  • A. Veidenbaum, K. Gallivan
  • Computer Science
    Proceedings Innovative Architecture for Future Generation High-Performance Processors and Systems
  • 1997
TLDR
This paper discusses an approach to reducing memory latency in future systems where a single chip DRAM/processor will not be feasible even in 10 years, e.g. systems requiring a large memory and/or many CPU's.
Empirical Study for Optimization of Power-Performance with On-Chip Memory
TLDR
This study quantitatively examined the effectiveness of the on-chip memory in an SH-4 processor by directly measuring the real power of the processor and proposed an on- chip RAM architecture called SCIMA (software controllable integrated memory architecture) which enables DMA (direct memory access) transfer to theOn- chip memory.
The Energy Efficiency Of Iram Architectures
  • R. Fromm, S. Perissakis, +5 authors K. Yelick
  • Computer Science
    Conference Proceedings. The 24th Annual International Symposium on Computer Architecture
  • 1997
TLDR
This work finds that IRAM memory hierarchies consume as little as 22% of the energy consumed by a conventional memory hierarchy for memory-intensive applications, while delivering comparable performance.
Operating Systems Techniques for Parallel Computation in Intelligent Memory
TLDR
This study examines operating system techniques that allow Active Page memories to share, or multiplex, embedded VLIW processors across multiple physical Active Pages, and finds that hardware costs of computational logic can be reduced from 31% of DRAM chip area to 12%, through multiplexing, without significant loss in performance.
Efficient management of memory hierarchies in embedded DRAM systems
TLDR
This paper shows how the embedded memory can be used to provide both data storage and a caching capability equivalent to a more complex processor device, and implies that designers of embedded and small-scale systems can achieve significant performance wins through the use of a combined processor/ memory device and the memory system design proposed.
The architecture of the DIVA processing-in-memory chip
TLDR
The DIVA (Data IntensiVe Architecture) system incorporates a collection of Processing-In-Memory chips as smart-memory co-processors to a conventional microprocessor, and a PIM-based architecture with many such chips yields significantly higher performance than a multiprocessor of a similar scale and at a much reduced hardware cost.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 33 REFERENCES
Intelligent RAM (IRAM): chips that remember and compute
TLDR
IRAM is attractive because the gigabit DRAM chip has enough transistors for both a powerful processor and a memory big enough to contain whole programs and data sets, and it needs more metal layers to accelerate the long lines of 600mm/sup 2/ chips.
Missing the Memory Wall: The Case for Processor/Memory Integration
TLDR
It is shown that processor memory integration can be used to build competitive, scalable and cost-effective MP systems and results from execution driven uni- and multi-processor simulations show that the benefits of lower latency and higher bandwidth can compensate for the restrictions on the size and complexity of the integrated processor.
A four megabit Dynamic Systolic Associative Memory chip
  • G. Lipovski
  • Computer Science
    J. VLSI Signal Process.
  • 1992
TLDR
In a preliminary logic design of a (1024×4096) associative memory chip based on a 4 Mbit DRAM, the ∼6 transistor per sense amplifier in a DRAM are expanded by ∼9 transistors per sense amplifiers in the modified chip.
Combined DRAM and logic chip for massively parallel systems
TLDR
The basic chip technology and organization, some projections on the future of EXECUBE-like PIM chips, and finally some lessons to be learned as to why this technology should radically affect the way the authors ought think about computer architecture are overviewed.
The Energy Efficiency Of Iram Architectures
  • R. Fromm, S. Perissakis, +5 authors K. Yelick
  • Computer Science
    Conference Proceedings. The 24th Annual International Symposium on Computer Architecture
  • 1997
TLDR
This work finds that IRAM memory hierarchies consume as little as 22% of the energy consumed by a conventional memory hierarchy for memory-intensive applications, while delivering comparable performance.
Parallel processing RAM chip with 256 Mb DRAM and quad processors
TLDR
Parallel processing RAM (PPRAM) is an architectural framework for merged memory/logic application-specific standard products (ASSPs) that integrates onto a single chip a large amount of DRAM and a common network interface based on a common communication protocol.
Computational Ram: A Memory-simd Hybrid And Its Application To Dsp
TLDR
This paper describes the CoRAM architecture, a working 8Kbit prototype, a full scale CoRAM designed in a 4Mbit DRAM process, and CoRAM applications.
A multimedia 32 b RISC microprocessor with 16 Mb DRAM
  • T. Shimizu, J. Korematu, +15 authors K. Saitoh
  • Computer Science
    1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC
  • 1996
TLDR
This 32 b microprocessor with on-chip 2 MB DRAM is for multimedia applications that require a low-power embedded microprocessor and large memory and integrates 17 M transistors in 19.7 mm/sup 2/.
A 7.68 GIPS 3.84 GB/s 1W parallel image processing RAM integrating a 16 Mb DRAM and 128 processors
  • Y. Aimoto, T. Kimura, +9 authors K. Koyama
  • Computer Science
    1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC
  • 1996
TLDR
Large memory capacity and high-data-rate random access achieved by these techniques make the PIP-RAM suitable for image processing of large-scale, full-color pictures.
A 1 MB, 100 MHz integrated L2 cache memory with 128b interface and ECC protection
TLDR
The advent of 20 ns, 16 Mb DRAM technology has made a high-speed single-chip 1MB cache possible, replacing multiple SRAM and logic modules, saving board space and reducing power.
...
1
2
3
4
...