Scalable shared-memory multiprocessor architectures

@article{Thakkar1990ScalableSM,
  title={Scalable shared-memory multiprocessor architectures},
  author={Shreekant S. Thakkar and Michel Dubois and Anthony T. Laundrie and Gurindar S. Sohi and David V. James and Stein Gjessing and Manu Thapar and Bruce Delagi and Michael J. Carlton and Alvin M. Despain},
  journal={Computer},
  year={1990},
  volume={23},
  pages={71-74}
}
Directory-based and bus-based cache coherence schemes are defined and described. Directory-based schemes can be classified as centralized or distributed. Both categories support local caches to improve processor performance and reduce traffic in the interconnection. Schemes using presence flags, B pointers, and linked lists are discussed. Bus-based systems provide uniform memory access to all processors. This memory organization allows a simpler programming model, making it easier to develop… 

Figures from this paper

Hardware approaches to cache coherence in shared-memory multiprocessors. 2
TLDR
The coherence problem in multilevel cache hierarchies and large-scale, shared-memory multiprocessors and the principles of the two major groups of hardware protocols are discussed and relevant representatives are summarized.
A Distributed Hardware Mechanism for Process Synchronization on Shared-Bus Multiprocessors
TLDR
A new technique is presented that uses distributed hardware locking queues to reduce both contention and latency to the minimum values that can be obtained using a shared-bus.
Reflective-memory multiprocessor
TLDR
The paper describes the reflective memory implementation of the Encore Infinity-a distributed shared memory multiprocessor that employs Spinlocks to manage the herculean cache coherency problems which are a natural result of any system which employs massive replication.
Local-Area MultiProcessor: the scalable coherent interface
TLDR
A new architectural model, Local-Area Multiprocessor, is introduced and the general properties that an appropriate system architecture should have are considered, and practical design decisions are made.
Cache Coherence Protocols in Shared-Memory Multiprocessors
TLDR
This paper is a review of the recent research about the design of cache coherence protocols in shared-memory multiprocessors and focuses on snoopy coherence, which is simple and easy to implement, but relies on a low-latency, shared interconnection among the processors and the memory modules.
The M2 hierarchical multiprocessor
A scalable snoopy coherence scheme on distributed shared-memory multiprocessors
TLDR
A scalable snoopy scheme based on a single-hop-connected multiple-bus topology that can enjoy fast memory access and can also take advantage of greater scalability, and whose number of transceivers can grow naturally with the number of processors in the system.
Extending the scalable coherent interface for large-scale shared-memory multiprocessors
TLDR
This dissertation investigates ways to efficiently share frequently changing data among thousands of processors using Scalable Coherent Interface (SCI), and investigates two new cache-coherence protocols that employ trees of cache lines and have similar or lower latency than SCI.
Scalability Issues of Shared virtual Memory for Multicomputers
TLDR
This paper addresses issues and describes solutions of how to take advantage of the fast data transmission among memories to implement large SVM address spaces on large-scale multicomputers.
The M/sup 2/ hierarchical multiprocessor
TLDR
The design and development of a bus-based hierarchical multiprocessor named M/sup 2/ is discussed, which features a much higher degree of scalability than the shared-memory shared-bus architecture and exploits parallelism at both medium- and coarse-grain levels.
...
1
2
3
...

References

SHOWING 1-10 OF 11 REFERENCES
Hierarchical cache/bus architecture for shared memory multiprocessors
TLDR
The model indicates that a system of over 1000 usable MIPS can be constructed using high performance microprocessors and that the additional coherency protocol overhead introduced by the clustered approach is small.
The Cache Coherence Protocol of the Data Diffusion Machine
TLDR
The Data Diffusion Machine (DDM), a scalable shared memory multiprocessor in which the location of a datum in the machine is completely decoupled from its address, provides an automatic duplication and migration of the data to wherever needed.
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor
TLDR
The Wisconsin Multicube, is a large-scale, shared-memory multiprocessor architecture that employs a snooping cache protocol over a grid of buses and allows for a cache-coherent protocol for which most bus requests can be satisfied with no more than twice the number of bus operations required of a single-bus multi.
A New Solution to Coherence Problems in Multicache Systems
A memory hierarchy has coherence problems as soon as one of its levels is split in several independent units which are not equally accessible from faster levels or processors. The classical solution
Synapse tightly coupled multiprocessors: a new approach to solve old problems
TLDR
Using a non-write-through cache and the Synapse Expansion Bus, Synapse has designed a symmetric, tightly coupled multiprocessor system, capable of being expanded on line and under power from two through twenty-eight processors with a linear improvement in system performance.
Predicting the Performance of Shared Multiprocessor Caches
We investigate the performance of shared caches in a shared-memory multiprocessor executing parallel programs, and formulate simple models for estimating the load placed on the bus by such a shared
Analysis of cache invalidation patterns in multiprocessors
TLDR
This paper analyzes the cache invalidation patterns caused by several parallel applications and investigates the effect of these patterns on a directory-based protocol, and proposes a classification scheme for data objects found in parallel programs and links the invalidation traffic patterns observed in the traces back to these high-level objects.
Performance Evaluation of Wide Shared Bus Multiprocessors
TLDR
This work compares the simulated performance of a family of multiprocessor architectures based on a global shared memory with real-world performance of these architectures.
A characterization of sharing in parallel programs and its application to coherency protocol evaluation
TLDR
Simulation results indicate that (1) neither protocol dominates in performance; and (2) the write run model is a good predictor of protocol performance when the unit of the coherency operations matches that in the sharing analysis.
...
1
2
...