Anatomy of a message in the Alewife multiprocessor

@inproceedings{Kubiatowicz1993AnatomyOA,
  title={Anatomy of a message in the Alewife multiprocessor},
  author={John D. Kubiatowicz and Anant Agarwal},
  booktitle={ICS '93},
  year={1993}
}
Shared-memory provides a uniform and attractive mechanism for communication. For efficiency, it is often implemented with a layer of interpretive hardware on top of a message-passing communications network. This interpretive layer is responsible for data location, data movement, and cache coherence. It uses patterns of communication that benefit common programming styles, but which are only heuristics. This suggests that certain styles of communication may benefit from direct access to the… 

Figures from this paper

Communication mechanisms in shared memory multiprocessors
TLDR
This paper examines the performance of five shared-memory communication mechanisms -- invalidate-based cache coherence, prefetch, locks, deliver, and StreamLine -- to determine the effectiveness of architectural support for efficient producer-initiated communication.
Integrating multiple communication paradigms in high performance multiprocessors
TLDR
The goal is to provide message passing performance that is comparable to an aggressive hardware implementation dedicated to this task, and to provide an integrated solution that handles the interaction of message data with virtual memory, protected multiprogramming, and cache coherence.
The MIT Alewife machine: architecture and performance
TLDR
Analysis of the MIT Alewife machine shows that integrating message passing with shared memory enables a cost-efficient solution to the cache coherence problem and provides a rich set of programming primitives.
On the use and performance of explicit communication primitives in cache-coherent multiprocessor systems
  • Xiaohan Qin, J. Baer
  • Computer Science
    Proceedings Third International Symposium on High-Performance Computer Architecture
  • 1997
TLDR
This paper proposes a set of communication primitives implemented on a communication co-processor that introduce a flavor of message passing and permit protocol optimization and assess the overhead of the software implementation of the primitives and protocols.
The Stanford FLASH multiprocessor
TLDR
The architecture of FLASH and MAGIC is presented, and the base cache-coherence and message-passing protocols are discussed, and Latency and occupancy numbers, which are derived from the system-level simulator and the Verilog code, are given.
The Stanford FLASH multiprocessor
TLDR
The architecture of FLASH and MAGIC is presented, and the base cache-coherence and message-passing protocols are discussed, and Latency and occupancy numbers, which are derived from the system-level simulator and the Verilog code, are given.
Parallel Communication Mechanisms for Sparse, Irregular Applications
TLDR
This thesis performs an in-depth study of the interaction between communication mechanisms and sparse, irregular applications, and presents the Remote Queues (RQ) communication model, an abstraction which synthesizes more efficient synchronization for hardware-supported shared memory and other complex systems.
Mechanisms and interfaces for software-extended coherent shared memory
TLDR
This dissertation proposes, designs, tests, measures, and models the novel software extended memory system of Alewife, a large-scale multiprocessor architecture that facilitates the development of memory-system software and enables a smart memory system, which uses intelligence to help improve performance.
Message Passing Support in the Avalanche W idget 1
TLDR
It is shown via a simulation study how a design called the Widget can be used with existing commercial workstation technology to significantly reduce these costs to support efficient message passing in the Avalanche multiprocessing system.
MGS: A Multigrain Shared Memory System
TLDR
This paper introduces the design of a shared memory system that uses multiple granularities of sharing, and presents an implementation on the Alewife multiprocessor, called MGS, and finds that unmodified shared memory applications can exploit multigrain sharing.
...
...

References

SHOWING 1-10 OF 41 REFERENCES
The MIT Alewife machine: architecture and performance
TLDR
Analysis of the MIT Alewife machine shows that integrating message passing with shared memory enables a cost-efficient solution to the cache coherence problem and provides a rich set of programming primitives.
Active Messages: A Mechanism for Integrated Communication and Computation
TLDR
It is shown that active messages are sufficient to implement the dynamically scheduled languages for which message driven machines were designed and, with this mechanism, latency tolerance becomes a programming/compiling concern.
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
TLDR
The Alewife architecture is described and the novel hardware features of the machine including LimitLESS directories and the rapid context switching processor are concentrated on.
Cooperative shared memory: software and hardware for scalable multiprocessors
TLDR
The initial implementation of cooperative shared memory uses a simple programming model, called Check-In/Check-Out (CICO), in conjunction with even simpler hardware, called Dir1SW, that adds little complexity to message-passing hardware, but efficiently supports programs written within the CICO model.
The J-Machine: A Fine-Gain Concurrent Computer
TLDR
The J-Machine is a fine-grain concurrent computer that provides low-overhead primitive mechanisms for communication, synchronization, and translation that efficiently support most proposed nodels of concurrei t computation.
APRIL: a processor architecture for multiprocessing
TLDR
The authors show that the SPARC-based implementation of APRIL can achieve close to 80% processor utilization with as few as three resident threads per processor in a large-scale cache-based machine with an average base network latency of 55 cycles.
The J-machine multicomputer: an architectural evaluation
TLDR
The design of the J-Machine is discussed and the effectiveness of the mechanisms incorporated into the MDP are evaluated to measure the performance of the communication and synchronization mechanisms directly and investigate the behavior of four complete applications.
The directory-based cache coherence protocol for the DASH multiprocessor
TLDR
The design of the DASH coherence protocol is presented and discussed from the viewpoint of how it addresses the issues of correctness, performance, and protocol complexity.
A tightly-coupled processor-network interface
TLDR
The interface architecture reduces communication overhead five fold in the authors' benchmarks and most of the performance gain comes from simple, low cost hardware mechanisms for fast dispatching on, forwarding of, and replying to messages.
The Stanford Dash multiprocessor
TLDR
The overall goals and major features of the directory architecture for shared memory (Dash), a distributed directory-based protocol that provides cache coherence without compromising scalability, are presented.
...
...