Using cache memory to reduce processor-memory traffic

  title={Using cache memory to reduce processor-memory traffic},
  author={James R. Goodman},
  booktitle={ISCA '83},
  • J. Goodman
  • Published in ISCA '83 1983
  • Computer Science
The importance of reducing processor-memory bandwidth is recognized in two distinct situations: single board computer systems and microprocessors of the future. Cache memory is investigated as a way to reduce the memory-processor traffic. We show that traditional caches which depend heavily on spatial locality (look-ahead) for their performance are inappropriate in these environments because they generate large bursts of bus traffic. A cache exploiting primarily temporal locality (look-behind… 

Figures from this paper

Scalable and Efficient Algorithms for Unstructured Mesh Computations
A hybrid parallelization approach was developed combining the distributed, shared, and vectorial forms of parallelism in a fine grain taskbased approach applied to irregular structures that has been ported to several industrial applications developed by Dassault Aviation and has led to important speedups using standard multicores and the Intel Xeon Phi manycore.
WAYPOINT: scaling coherence to thousand-core architectures
To achieve thousand-core scalability with smaller and less associative sparse directories, WayPoint is introduced, a mechanism that increases directory associativity and capacity dynamically and achieves performance within 4% of an infinitely large on-die directory.
Locally parallel cache design based on KL1 memory access characteristics
The memory access characteristics in KL1 parallel execution and a locally parallel cache mechanism with hardware lock are described and new software controlled memory access commands are introduced, named DW, ER, and RP.
ASPEN: High-performance hardware support for distributed shared-memory
This thesis describes and evaluates an integrated memory and network subsystem designed to provide the abstraction of shared memory among workstations and presents additional experimental data that suggests Aspen is scalable to larger numbers of processors with comparable performance.
A class of compatible cache consistency protocols and their support by the IEEE futurebus
This paper defines a class of compatible consistency protocols supported by the current IEEE Futurebus design, referred to as the MOESI class of protocols, which has the property that any system component can select (dynamically) any action permitted by any protocol in the class, and be assured that consistency is maintained throughout the system.
Hardware techniques to improve the performance of the processor/memory interface
Hardware techniques to mitigate bandwidth-related performance losses are explored and a hybrid called the Indir ct Cache—which manages an on-chip cache much like a physical memory, with its own page table and translation buffer—is evaluated.
CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators
CoNDA is proposed, a coherence mechanism that lets an NDA optimistically execute an Nda kernel, under the assumption that the NDA has all necessary coherence permissions, and allows CoNDA to gather information on the memory accesses performed by the Nda and by the rest of the system.
A Look at Computer Architecture Evaluation Methodologies
Based on the tools used by architects, six key papers that have been influential on past work and will likely continue to be influential in the future are identified.
LazyPIM: Efficient Support for Cache Coherence in Processing-in-Memory Architectures
LazyPIM is proposed, a new hardware cache coherence mechanism designed specifically for PIM that improves average performance across a range of data-intensive PIM applications by 19.6%, reduces off-chip traffic by 30.9%, and reduces energy consumption by 18.0%, over the best prior approaches to PIM coherence.
Micro-Sector Cache
Study of sectored DRAM caches that exercise large allocation units called sectors, invest reasonably small storage to maintain tag/state, enable space- and bandwidth-efficient tag/ state caching, and minimize main memory bandwidth wastage by fetching only the useful portions of an allocated sector are studied.


Cache Size; blocks are 4 bytes; PDP-11 traces. The bus transfer ratio is the number of transfers between cache and main store relative 'to
    White we were interested in this for a single-chip microcomputer of the future, we have also demonstrated that such an approach is feasible for one or more currently popular commercial markets
      Block Size for warm and cold starts; PDP-11 traces 2. Bus Transfer and Miss Ratios vs. Cache Size; 4-byte blocks; VAX-11 traces. Bus Transfer Ratio vs. Block Size for-warm and cold starts