The processor-memory bottleneck: problems and solutions

  title={The processor-memory bottleneck: problems and solutions},
  author={Nihar R. Mahapatra and Balakrishna V. Venkatrao},
The rate of improvement in microprocessor speed exceeds the rate of improvement in DRAM (Dynamic Random Access Memory) speed. So although the disparity between processor and memory speed is already an issue, downstream someplace it will be a much bigger one. Hence computer designers are faced with an increasing Processor - Memory Performance Gap [1], which now is the primary obstacle to improved computer system performance. This article examines this problem as well as its various solutions. 
Reducing Memory Bottlenecks in Embedded, Parallel Image Processors
This work addresses the problem of memory access bottlenecks in parallel digital image processors and presents one solution which demonstrates up to 93.4% reduction over standard sequential methods. Expand
A Survey on Computer System Memory Management
Computer memory is central to the operation of a modern computer system; it stores data or program instructions on a temporary or permanent basis for use in a computer. In this paper, various memoryExpand
Design Strategy of Cache Memory for Computer Performance Improvement
A cache memory, sometimes called a cache store or RAM cache, is fundamentally a portion of memory made of high-speed static RAM (SRAM) instead of the slower and cheaper dynamic RAM (DRAM) used forExpand
Reducing processor-memory performance gap and improving network-on-chip throughput
This thesis proposes modifications in storage engine (SE) of a DBMS aiming at fast access to data through bypassing the slow disk interfaces while maintaining all the functionalities of a robust DBMS and proposes a selection scheme that switches routing algorithm of an NoC with changing traffic pattern of an application. Expand
Memory Latency Reduction via Thread Throttling
A memory thread throttling mechanism that tunes the allowable memory threads dynamically under workload variation to improve system performance and leads to a geometric mean of 12% performance improvement for real-world applications on the same hardware. Expand
A Novel Prefetch Technique for High Performance Embedded System
The proposed prefetch technique can fetch the data from main memory prior to actual requests to reduce the long latency to the main memory. Expand
Architecting Memory Systems for Emerging Technologies
The advance of traditional dynamic random access memory (DRAM) technology has slowed down, while the capacity and performance needs of memory system have continued to increase. This is a result ofExpand
A data dependency recovery system for a heterogeneous multicore processor
An software framework codenamed Lyuba is described that handles lowlevel data hazards and automatically recovers the application from data hazards without programmer and speculation intervention for an asymmetric chip-multicore processor. Expand
Transparent memory hierarchy compression and migration
This dissertation presents several new operating system and architecture techniques that use elements of the virtual and physical memory system to improve the functionality, power consumption, and performance of embedded systems such as multimedia devices and wireless sensor network nodes. Expand
Replacement techniques for improving performance in sub-block caches
  • O. Olorode, M. Nourani
  • Computer Science
  • 2014 48th Asilomar Conference on Signals, Systems and Computers
  • 2014
This work proposes intelligent subblock cache replacement policies that use the valid state of individual sub-blocks in replacement decisions at the super-block level, to improve performance in sub-blocking architectures. Expand


A case for intelligent RAM
The state of microprocessors and DRAMs today is reviewed, some of the opportunities and challenges for IRAMs are explored, and performance and energy efficiency of three IRAM designs are estimated. Expand
A Case for Intelligent RAM: IRAM
Two trends call into question the current practice of microprocessors and DRAMs being fabricated as different chips on different fab lines: 1) the gap between processor and DRAM speed is growing atExpand
Dynamic base register caching: a technique for reducing address bus width
  • M. Farrens, A. Park
  • Computer Science
  • [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture
  • 1991
Trace driven simulations indicate that caching the higher order portions of address references in a set of dynamically allocated base registers can significantly reduce processor-to-memory address bus width and increase available processor bandwidth. Expand
Dynamic Access Ordering: Bounds on Memory Bandwidth
A model of of SMC startup costs is introduced, and the uniprocessor SMC models are extended to describe performance for modest-sized symmetric multiprocesser (SMP) SMC systems. Expand
Memory Bandwidth Limitations of Future Microprocessors
It is predicted that off-chip accesses will be so expensive that all system memory will reside on one or more processor chips, and pin bandwidth limitations will make more complex on-chip caches cost-effective. Expand
Information content of CPU memory referencing behavior
Techniques are developed for analyzing the effectiveness of the addressing architecture and Memory/CPU traffic of existing machines with respect to the information theoretic bound for a given trace. Expand
Creating a wider bus using caching techniques
  • D. Citron, L. Rudolph
  • Computer Science
  • Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture
  • 1995
Simulations have shown that over 90% of all informative transferred can be sent in a single cycle when using a 32 bit processor connected by a 16 bit wide bus to a 32 bits memory module. Expand
Hitting the memory wall: implications of the obvious
This work proposes an exact analysis, removing all remaining uncertainty, based on model checking, using abstract-interpretation results to prune down the model for scalability, and notably improves precision upon classical abstract interpretation at reasonable cost. Expand
Computer Architecture: A Quantitative Approach
This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most importantExpand
A text-compression-based method for code size minimization in embedded systems
This work addresses the problem of code-size minimization in VLSI systems with embedded DSP processors with data-compression methods, and describes two methods that have different performance characteristics and different degrees of freedom in compressing the code. Expand