MP-LOCKs: replacing H/W synchronization primitives with message passing

@article{Kuo1999MPLOCKsRH,
  title={MP-LOCKs: replacing H/W synchronization primitives with message passing},
  author={Chen-Chi Kuo and John B. Carter and Ravindra Kuramkote},
  journal={Proceedings Fifth International Symposium on High-Performance Computer Architecture},
  year={1999},
  pages={284-288}
}
Shared memory programs guarantee the correctness of concurrent accesses to shared data using interprocessor synchronization operations. The most common synchronization operators are locks, which are traditionally implemented via a mix of shared memory accesses and hardware synchronization primitives like test-and-set. In this paper, we argue that synchronization operations implemented using fast message passing and kernel-embedded lock managers are an attractive alternative to dedicated… 

Figures from this paper

Mechanisms for efficient shared-memory, lock-based synchronization
TLDR
It is found that QOLB, which is the first primitive to incorporate all four mechanisms, outperforms all other primitives in all cases, and a new locking primitive, called VAQUM, that has the potential to outperform existing primitives is proposed.
GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs
TLDR
This paper proposes and evaluates \textit{GLocks], a hardware-supported implementation for highly-contended locks in the context of many-core CMPs that skips the memory hierarchy to provide a non-intrusive, extremely efficient and fair lock implementation with negligible impact on energy consumption or die area.
Implementation and Evaluation of a Hardware Decentralized Synchronization Lock for MPSoCs
TLDR
This work provides a hardware decentralized solution to manage dynamic re-homing of locks in a dedicated memory, close to the latest access-granted core, which reduces overall access latency and network traffic in case of reuse of the lock within the same cluster.
Fast synchronization on scalable cache-coherent multiprocessors using hybrid primitives
TLDR
A systematic methodology for transforming any synchronization primitive that uses RMW instructions into a hybrid one is presented and experimental evidence on the effectiveness of using hybrid primitives in the implementation of spin locks, barriers and lock-free queues is provided in microbenchmarks and parallel applications on a SGI Origin 2000.
A quantitative architectural evaluation of synchronization algorithms and disciplines on ccNUMA systems: the case of the SGI Origin2000
This paper assesses the performance and scalability of several software synchronization algorithms, as well as the interrelationship between synchronization, multiprogramming and parallel job
Efficient synchronization and communication in many-core chip multiprocessors
TLDR
GBarrier is a hardware-based barrier mechanism especially aimed at providing efficient barriers in future many-core CMPs, and deploys a dedicated G-Line-based network to allow for fast and efficient signaling of barrier arrival and departure.
The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors
TLDR
This paper takes a new approach by analyzing the sources of synchronization latency on ccNUMA architectures and how can this latency be reduced by leveraging hardware and software schemes in both dedicated and multiprogrammed execution environments.
Accurate MPSoC Prototyping Platform and Methodology for the Studying of the Linux Synchronization Barrier Slowdown Issues
TLDR
A methodology to study the impact of hardware contention in the synchronization barrier mechanism running on a shared memory clustered MPSoC using a new observation methodology based on emulation to identify hardware module restrictions and Linux kernel suboptimal services is proposed.
A simple lock manager for distributed heterogeneous systems
TLDR
The concept of semaphore is used as a basic structure to manage critical regions in a distributed heterogeneous system and it is shown that it has the necessary and sufficient locking facilities and supports heterogeneous distribution.
...
1
2
...

References

SHOWING 1-10 OF 55 REFERENCES
Integrating message-passing and shared-memory: early experience
TLDR
An architecture, Alewife, is described that integrates support for shared-memory and message-passing through a simple interface and expects the compiler and runtime system to cooperate in using appropriate hardware mechanisms that are most efficient for specific operations.
Algorithms for scalable synchronization on shared-memory multiprocessors
TLDR
The principal conclusion is that contention due to synchronization need not be a problemin large-scale shared-memory multiprocessors, and the existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides protection against so-called “dance hall” architectures.
Dynamic decentralized cache schemes for mimd parallel processors
TLDR
It appears that moderately large parallel processors can be designed by employing the principles presented in this paper, and both schemes feature decentralized consistency control and dynamic type classification of the datum cached.
Reactive synchronization algorithms for multiprocessors
TLDR
The notion of consensus objects that the reactive algorithms use to preserve correctness in the face of dynamic protocol changes are described and demonstrate that reactive algorithms perform close to the best static choice of protocols at all levels of contention.
The Stanford FLASH multiprocessor
TLDR
The architecture of FLASH and MAGIC is presented, and the base cache-coherence and message-passing protocols are discussed, and Latency and occupancy numbers, which are derived from the system-level simulator and the Verilog code, are given.
Distributed operating systems based on a protected global virtual address space
TLDR
It is believed that a distributed operating system built upon a software distributed shared memory system can provide the advantages of conventional message-based distributed operating systems, in addition to several other benefits e.g. easy sharing of complex data structures between processes, transparent replication of server functions, and a uniform interface for all communication.
Munin: distributed shared memory based on type-specific memory coherence
TLDR
This paper focuses on the design and use of Munin's memory coherence mechanisms, and compares the approach to previous work in this area.
Tempest and Typhoon: user-level shared memory
TLDR
The authors simulated Typhoon on the Wisconsin Wind Tunnel and found that Stache running on Typhoon performs comparably to an all-hardware Dir/sub N/NB cache-coherence protocol for five shared-memory programs.
Algorithmic foundations for a parallel vector access memory system
TLDR
The underlying PVA algorithms for both word interleaved and cache-line inter-leaved memory systems are presented, showing the regularity of vectors or streams to access them efficiently in parallel on a multi-bank SDRAM memory system.
An implementation of the Hamlyn sender-managed interface architecture
TLDR
The Hamlyn interface architecture uses sender-based memory management to eliminate receiver buffer overruns, provides applications with direct hardware access to minimize latency, supports adaptive routing networks to allow higher throughput, and offers full protection between applications so it can be used in a general-purpose computing environment.
...
1
2
3
4
5
...