A quantitative architectural evaluation of synchronization algorithms and disciplines on ccNUMA systems: the case of the SGI Origin2000

@inproceedings{Nikolopoulos1999AQA,
  title={A quantitative architectural evaluation of synchronization algorithms and disciplines on ccNUMA systems: the case of the SGI Origin2000},
  author={Dimitrios S. Nikolopoulos and Theodore S. Papatheodorou},
  booktitle={ICS '99},
  year={1999}
}
This paper assesses the performance and scalability of several software synchronization algorithms, as well as the interrelationship between synchronization, multiprogramming and parallel job scheduling, on ccNUMA systems. Using the SGl Origin2000, we evaluate synchronization algorithms for spin locks, lock-free concurrent queues, and barriers. We analyze the sensitivity of synchronization algorithms to the hardware implementation of elementary synchronization primitives and investigate in… Expand
The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors
TLDR
This paper takes a new approach by analyzing the sources of synchronization latency on ccNUMA architectures and how can this latency be reduced by leveraging hardware and software schemes in both dedicated and multiprogrammed execution environments. Expand
Fast synchronization on scalable cache-coherent multiprocessors using hybrid primitives
TLDR
A systematic methodology for transforming any synchronization primitive that uses RMW instructions into a hybrid one is presented and experimental evidence on the effectiveness of using hybrid primitives in the implementation of spin locks, barriers and lock-free queues is provided in microbenchmarks and parallel applications on a SGI Origin 2000. Expand
Busy-Wait Barrier Synchronization Using Distributed Counters with Local Sensor
TLDR
A new implementation, distributed counters with local sensor, is introduced, which considerably reduces overhead on POWER3 and POWER4 SMP systems and expects the relative performance of this implementation to increase with the number of processors in an SMP and as memory latencies lengthen relative to cache latencies. Expand
Evaluating the performance of non-blocking synchronization on shared-memory multiprocessors
TLDR
The goal of this work was to provide an in depth understanding of how non-blocking can improve the performance of modern parallel applications by using the general translations that are provided in this paper. Expand
Integrating non-blocking synchronisation in parallel applications: performance advantages and methodologies
TLDR
The results obtained show that for many applications, non-blocking synchronisation lead to significant speedups for a fairly large number of processors, while it never slows the applications down. Expand
Evaluating The Performance of Non-Blocking Synchronisation on Modern Shared-Memory Multiprocessors
Parallel programs running on shared memory multiprocessors coordinate via shared data objects/structures. To ensure the consistency of the shared data structures, programs typically rely on someExpand
Performance Impact of Lock-Free Algorithms on Multicore Communication APIs
TLDR
Migration from single to multicore hardware architectures degrades lock-based performance, and increases lock-free performance, as well as predicting performance at the system architecture level and providing a stop criterion for the refactoring. Expand
Effects of Locking and Synchronization on Future Large Scale CMP Platforms
As we enter the era of large-scale Chip MultiProcessing (CMP) systems, evaluating architectures and projecting performance for commercial workloads on such systems is becoming increasingly important.Expand
NOBLE : A Non-Blocking Inter-Process Communication Library
TLDR
This paper introduces a library support for multi-process non-blocking synchronization called NOBLE, which provides an inter-process communication interface that allows the user to select synchronisation methods transparently to the one that suits best for the current application. Expand
A New Prediction Oriented Barrier Synchronization on SMP Clusters
Clusters of Symmetric Multiprocessors (CSMP) are becoming an increasingly popular high-performance computing platform due to the commodity availability of multiprocessor nodes, mature SMP operatingExpand
...
1
2
...

References

SHOWING 1-10 OF 56 REFERENCES
Evaluating synchronization on shared address space multiprocessors: methodology and performance
TLDR
It is found that although the efficient hardware support for synchronization provided on the SGI Origin 2000 machine usually helps lock and barrier microbenchmarks, it does not help in improving application performance when compared to good software algorithms that use the processor-provided LL-SC instructions. Expand
Scheduler-conscious synchronization
TLDR
It is found that while it is possible to avoid pathological performance problems using previously proposed kernel mechanisms, a modest additional widening of the kernel/user interface can make scheduler-conscious synchronization algorithms significantly simpler and faster, with performance on dedicated machines comparable to that of Scheduler-oblivious algorithms. Expand
The impact of operating system scheduling policies and synchronization methods of performance of parallel applications
TLDR
This paper uses detailed simulation studies to evaluate the performance of several different scheduling strategies, and shows that in situations where the number of processes exceeds thenumber of processors, regular priority-based scheduling in conjunction with busy-waiting synchronization primitives results in extremely poor processor utilization. Expand
Reactive synchronization algorithms for multiprocessors
TLDR
The notion of consensus objects that the reactive algorithms use to preserve correctness in the face of dynamic protocol changes are described and demonstrate that reactive algorithms perform close to the best static choice of protocols at all levels of contention. Expand
MP-LOCKs: replacing H/W synchronization primitives with message passing
TLDR
It is argued that synchronization operations implemented using fast message passing and kernel-embedded lock managers are an attractive alternative to dedicated synchronization hardware and should be considered as a replacement for hardware locks in future scalable multiprocessors that support efficient message passing mechanisms. Expand
Synchronization algorithms for shared-memory multiprocessors
A performance evaluation of the Symmetry multiprocessor system revealed that the synchronization mechanism did not perform well for highly contested locks, like those found in certain parallelExpand
Relative performance of preemption-safe locking and non-blocking synchronization on multiprogrammed shared memory multiprocessors
TLDR
Results indicate that data structure specific non blocking algorithms, which exist for stacks, FIFO queues and counters, can work extremely well: not only do they outperform preemption safe lock based algorithms on multiprogrammed machines, they also out perform ordinary locks on dedicated machines. Expand
Algorithms for scalable synchronization on shared-memory multiprocessors
TLDR
The principal conclusion is that contention due to synchronization need not be a problemin large-scale shared-memory multiprocessors, and the existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides protection against so-called “dance hall” architectures. Expand
A Nonblocking Algorithm for Shared Queues Using Compare-and-Swap
TLDR
The authors present a simple and efficient nonblocking shared FIFO queue algorithm with O(n) system latency, no additional memory requirements, and enqueuing and dequeuing times independent of the size of the queue. Expand
A Scalable Multi-Discipline, Multiple-Processor Scheduling Framework for IRIX
This document describes the processor scheduling framework implemented in the Silicon Graphics IRIX Version 5 operating system. This framework provides the standard features and behavior expected ofExpand
...
1
2
3
4
5
...