Performance Evaluation of the Slotted Ring Multiprocessor

@article{Barroso1995PerformanceEO,
  title={Performance Evaluation of the Slotted Ring Multiprocessor},
  author={Luiz Andr{\'e} Barroso and Michel Dubois},
  journal={IEEE Trans. Computers},
  year={1995},
  volume={44},
  pages={878-890}
}
As microprocessor speeds continue to improve at a very fast rate the bandwidth requirements for system level interconnections in multiprocessors may eventually rule out the use of shared buses even for small scale multiprocessors. On the other hand high speed unidirectional links are an emerging technology that has the potential to scale with microprocessor technology and could replace buses as the interconnection fabric for future multiprocessors. We evaluate the performance of the… 

Performance Issues in Ring-Based Multiprocessor Systems

This paper presents a detailed or realistic analysis of effects of each component in ring-based multiprocessor systems, required for system architects to select cost-effective components in ring–based multip rocessor system.

Performance Analysis of the Bidirectional Ring-Based Multiprocessor

A performance analysis of the proposed architecture is provided by deriving an analytical model and studying the eeects of varying system and workload parameters, and the performance of the bidirectional ring architecture is compared against the unidirectional slotted ring model.

Analysis of Interconnection Networks for Cache Coherent Multiprocessors with Scientific Applications

This paper presents queueing network models for these interconnection networks for mul-tiprocessors with private cache memories based on single and multiple closed classes of customers using simple mean value analysis (MVA) algorithms.

Performance issues in the design of hierarchical-ring and direct networks for shared-memory multiprocessors

This dissertation explores performance issues in the design of interconnection networks for shared-memory multiprocessors and considers low-dimensional direct and hierarchical-ring networks, and studies issues in topology, buffer management, switching, routing and flow control.

DRACO: optimized CC-NUMA system with novel dual-link interconnections to reduce the memory latency

A dual-link interconnection topology and its effective routing scheme to reduce the remote memory latency on the interconnection network is proposed and it is shown that the proposed system outperforms the traditional bi-directional ring-based system and excels the toroidal mesh- based system.

A comparative study of bidirectional ring and crossbar interconnection networks

A performance comparison of hierarchical ring- and mesh-connected multiprocessor networks

  • G. RavindranM. Stumm
  • Computer Science
    Proceedings Third International Symposium on High-Performance Computer Architecture
  • 1997
This paper compares the performance of hierarchical ring- and mesh-connected wormhole routed shared memory multiprocessor networks in a simulation study and shows that for workloads with little locality, meshes scale better than ring networks because ring-based systems have limited bisection bandwidth.

Performance evaluation of modified hierarchical ring by exploiting link utilization and memory access locality

  • Jong Wook KwakC. Jhon
  • Computer Science
    Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005.
  • 2005
The Torus ring has an advantage over the hierarchical ring when the destination of network packet is the adjacent local ring, especially to the backward direction, by exploiting the memory access locality.

References

SHOWING 1-10 OF 21 REFERENCES

The Performance Of Cache-coherent Ring-based Multiprocessors

  • L. BarrosoM. Dubois
  • Computer Science
    Proceedings of the 20th Annual International Symposium on Computer Architecture
  • 1993
This paper evaluates the performance of unidirectional slotted ring interconnection for small to medium scale shared memory systems, using a hybrid methodology of analytical models and trace-driven simulations and compares it to high performance split transaction buses.

Cache Coherence on a Slotted Ring

This paper introduces the Express Ring architecture and presents a snooping cache coherence protocol for this machine, and shows how consistency of shared memory accesses can be efficiently maintained in a ring-connected multiprocessor.

Analysis of multithreaded architectures for parallel computing

Prescriptive use of the model under various scenarios indicates that multithreading is effective, and an analytical models of multi threaded processor behavior based on a small set of architectural and program parameters are developed.

A Methodology for Performance Evaluation of Parallel Applications on Multiprocessors

Analysis and Comparison of Cache Coherence Protocols for a Packet-Switched Multiprocessor

Analytical models are developed for seven existing cache protocols, namely, Write-Once,write-Through, Synapse, Berkeley, Illinois, Firefly, and Dragon, which incorporate the requests for invalidation signals, write-through, and write-back operations, and the solution is based on the mean value analysis algorithm.

Comparative evaluation of latency reducing and tolerating techniques

Overall, it is shown that using suitahle combinations of the techniques, performance can be improved by 4 to 7 dmes, and caches and relaxed consistency UNformly improve performance.

Directory-based cache coherence in large-scale multiprocessors

It is found that the best solutions to the cache-coherence problem result from a synergy between a multiprocessor's software and hardware components.

Hector: a hierarchically structured shared-memory multiprocessor

The architecture of the Hector multiprocessor, which exploits current microprocessor technology to produce a machine with a good cost/performance tradeoff, is described, and its interconnection backplane is a key design feature that can accommodate future technology.

Performance of the SCI Ring

The flow control mechanism of the SCI ring is shown to effectively prevent node starvation and reduce the ability of nodes to unfairly consume ring bandwidth, but at the cost of decreased overall ring utilization.

SPLASH: Stanford parallel applications for shared-memory

This work presents the Stanford Parallel Applications for Shared-Memory (SPLASH), a set of parallel applications for use in the design and evaluation of shared-memory multiprocessing systems, and describes the applications currently in the suite in detail.