Cache Memories

@article{Smith1982CacheM,
  title={Cache Memories},
  author={Alan Jay Smith},
  journal={ACM Comput. Surv.},
  year={1982},
  volume={14},
  pages={473-530}
}
  • A. Smith
  • Published 1 September 1982
  • Computer Science
  • ACM Comput. Surv.
design issues. Specific aspects of cache memories tha t are investigated include: the cache fetch algorithm (demand versus prefetch), the placement and replacement algorithms, line size, store-through versus copy-back updating of main memory, cold-start versus warm-start miss ratios, mulhcache consistency, the effect of input /output through the cache, the behavior of split data/instruction caches, and cache size. Our discussion includes other aspects of memory system architecture, including… 

The V-Way Cache: Demand Based Associativity via Global Replacement

The proposed variable-way, or V-Way, set-associative cache achieves an average miss rate reduction of 13% on sixteen benchmarks from the SPEC CPU2000 suite, which translates into an average IPC improvement of 8%.

Stack-Based Single-Pass Cache Simulation

This chapter and the following chapter address the problem of simulating cache-based memory systems, which optimally requires measurement of the performance of a large number of cache designs.

Functional Implementation Techniques for CPU Cache Memories

Some of the issues that are involved in the implementation of highly optimized cache memories are considered and the techniques that can be used to help achieve the increasingly stringent design targets and constraints of modern processors are surveyed.

Improving cache hit ratio by extended referencing cache lines

An algorithm is presented to improve the cache-hit ratio by using an extension to the locality of references (spatial and temporal) by extending the reference flagging to additional data lines already residing in the cache besides the one referenced by the processor.

Comprehensive Review of Data Prefetching Mechanisms

Instead of waiting for a cache miss to initiate a memory fetch, data prefetching anticipates such misses and issues a fetch to the memory system in advance of the actual memory reference.

Cache Operations by MRU Change

The concept of MRU change is introduced and is shown to be useful in many aspects of cache design and performance evaluation, such as comparison of various replacement algorithms, improvement of prefetch algorithms, and speedup of cache simulation.

The Implementation and Evaluation of a Compiler-Directed Memory Interface

A novel memory interface architecture, called burst buffers, is described, which regularly attains more than a factor of two improvement in performance for media algorithms above a normal data cache using conventional DRAM technology.

Cache Memories for Data Flow Machines

Cache memories for dataflow machines are presented, and, in particular, four design principles for reducing the working set size of dataflow caches are introduced. They are (1) controlling the number

Caches versus object allocation

  • J. Liedtke
  • Computer Science
    Proceedings of the Fifth International Workshop on Object-Orientation in Operation Systems
  • 1996
Dynamic object allocation usually stresses the randomness of data memory usage; the variables of a dynamic cache working set are to some degree distributed stochastically in the virtual or physical

Lessons from Experimental Methodology of Cache Hierarchy Changes with the Memory Technology

Results of experiments indicate that more levels of cache does not necessarily means better performance for all benchmarks, that last level cache miss rate has no direct connection with the system performance, and that the average performance of exclusive cache hierarchy is more effective than that of inclusive cache.
...

References

SHOWING 1-10 OF 255 REFERENCES

Lockup-free instruction fetch/prefetch cache organization

A cache organization is presented that essentially eliminates a penalty on subsequent cache references following a cache miss and has been incorporated in a cache/memory interface subsystem design, and the design has been implemented and prototyped.

Performance of cache-based multiprocessors

An approximate model is developed to estimate the processor utilization and the speedup improvement provided by the caches, and these two parameters are essential to a cost-effective design.

Analysis of multiprocessor cache organizations with alternative main memory update policies

Queuing models were developed to analyze alternative main memory update policies in a multiprocessor system and results predicted by the models were validated by a set of simulations.

Cache Performance in the VAX-11/780

Measurements are reported including the hit ratios of data and instruction references, the rate of cache invalidations by I/O, and the amount of waiting time due to cache misses.

Cache memory systems for multiprocessor architecture

By appropriate cache system design, adequate memory system speed can be achieved to keep the processors busy and smaller cache memories are required for dedicated processors than for standard processors.

Cache memories for PDP-11 family computers

The concept of cache memory is introduced together with its major organizational parameters: size, associativity, block size, replacement algorithm, and write strategy, and simulation results are given showing how the performance of the cache varies with changes in these parameters.

Cache-based Computer Systems

A cache-based computer system employs a fast, small memory interposed between the usual processor and main memory that provides a smaller ratio of memory access times, and holds the processor idle while blocks of data are being transferred from main memory to cache rather than switching to another task.

Structural Aspects of the System/360 Model 85 II: The Cache

The cache, a high-speed buffer establishing a storage hierarchy in the Model 85, is discussed in depth in this part, since it represents the basic organizational departure from other SYSTEM/360

The cost and performance tradeoffs of buffered memories

The study indicates that the flagged registered swap algorithm is superior to three other common algorithms used and it is shown that when jobs are switched, a substantial number of memory requests are required before the buffer fills and gives a high hit ratio.

A bit-slice cache controller

An LSI bit-slice chip set is described which should reduce both controller cost and complexity of the cache controller and enable a memory designer to construct a wide variety of cache structures with a minimum number of components and interconnections.
...