Optimal Strategies for Spinning and Blocking

@article{Boguslavsky1994OptimalSF,
  title={Optimal Strategies for Spinning and Blocking},
  author={Leonid B. Boguslavsky and Karim Harzallah and Alexander Y. Kreinin and Kenneth C. Sevcik and Alek Vainshtein},
  journal={J. Parallel Distributed Comput.},
  year={1994},
  volume={21},
  pages={246-254}
}
In parallel and distributed computing environments, threads (or processes) share access to variables and data structures. To assure consistency during updates, locks are used. When a thread attempts to acquire a lock but finds it busy, it must choose between spinning, which means repeatedly attempting to acquire the lock in the hope that it will become free, and blocking, which means suspending its execution and relinquishing its processor to some other thread. The choice between spinning and… 
APPLES: Efficiently Handling Spin-lock Synchronization on Virtualized Platforms
TLDR
A framework named AdPtive Pause-Loop Exiting and Scheduling (APPLES) is proposed, which monitors the overhead caused by excessive spinning and preempting spinning VCPUs, and periodically adjusts spinning thresholds to reduce the overhead.
A new look at the roles of spinning and blocking
TLDR
This paper analyzes the shifting trade-off between spinning and blocking synchronization, and presents a proof of concept implementation that matches or exceeds the performance of both user-level spin-locks and the pthread mutex under a wide range of load factors.
Reducing Scalability Collapse via Requester-Based Locking on Multicore Systems
  • Yan Cui, Yingxin Wang, Fei Wang
  • Computer Science
    2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
  • 2012
TLDR
A novel lock implementation that allows tasks blocked on a lock to either spin or maintain a power-saving state according to the number of lock requesters is proposed and shows better scalability and energy efficiency than mutex locks and adaptive locks.
cient locking for multicore architectures
The scalability of multithreaded applications on current multicore systems is hampered by the performance of critical sections, due in particular to the costs of access contention and cache misses.
Coscheduling in the multicore era: the art of doing things simultaneously
TLDR
In most cases research suggests to avoid or reduce resource contentions by grouping tasks skillfully – which is basically a form of coscheduling, however, an integration of these research results in resource contention with a more or less noticeable impact on performance of individual tasks.
No More Backstabbing... A Faithful Scheduling Policy for Multithreaded Programs
TLDR
A scheduling policy called Faithful Scheduling (FF) is presented, which dramatically reduces context-switches as well as lock-holder thread preemptions and achieves high performance for both lightly and heavily loaded systems.
Unlocking Energy
TLDR
This paper proposes simple lock-based techniques for improving the energy efficiency of these systems by 33% on average, driven by higher throughput, and without modifying the systems.
Towards scalability collapse behavior on multicores
TLDR
Two new techniques (lock contention aware scheduler and requester‐based adaptive lock) are proposed to remove the scalability collapse on multicores and are implemented in the Linux kernel 2.6.4 and evaluated on an AMD 32‐core system to verify their effectiveness.
Scalable storage managers for the multicore era
TLDR
This thesis shows how to move the database engine off the critical path, proving that database engines can achieve the scalability needed to exploit today's parallel hardware and identifies scheduling as a critical area for current and future systems.
Callisto: co-scheduling parallel runtime systems
TLDR
Callisto is introduced, a resource management layer for parallel runtime systems that eliminates almost all of the scheduler-related interference between concurrent jobs, while still allowing jobs to claim otherwise-idle cores.
...
...

References

SHOWING 1-10 OF 25 REFERENCES
Empirical studies of competitve spinning for a shared-memory multiprocessor
TLDR
Seven strategies for determining whether and how long to spin before blocking are studied, finding that the standard blocking strategy performs poorly compared to mixed strategies, and adaptive algorithms perform better than non-adaptive ones.
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors
  • T. Anderson
  • Computer Science
    IEEE Trans. Parallel Distributed Syst.
  • 1990
The author examines the questions of whether there are efficient algorithms for software spin-waiting given hardware support for atomic instructions, or whether more complex kinds of hardware support
The performance implications of thread management alternatives for shared-memory multiprocessors
TLDR
This paper examines the performance implications of several data structure and algorithm alternatives for thread management in shared-memory multiprocessors and presents an Ethernet-style backoff algorithm that largely eliminates this effect.
The impact of operating system scheduling policies and synchronization methods of performance of parallel applications
TLDR
This paper uses detailed simulation studies to evaluate the performance of several different scheduling strategies, and shows that in situations where the number of processes exceeds thenumber of processors, regular priority-based scheduling in conjunction with busy-waiting synchronization primitives results in extremely poor processor utilization.
The effect of context switches on cache performance
TLDR
This work fed address traces of the processes running on a multi-tasking operating system through a cache simulator, to compute accurate cache-hit rates over short intervals, and estimated the cache performance reduction caused by a context switch.
Waiting algorithms for synchronization in large-scale multiprocessors
TLDR
Motivated by the observation that different synchronization types exhibit different wait-time distributions, a static choice of L<subscrpt>poll can yield close to optimal on-line performance against an adversary that is restricted to choosing wait times from a fixed family of probability distributions.
Modeling and Measuring Multiprogramming and System Overheads on a Shared-Memory Multiprocessor: Case Study
A survey of synchronization methods for parallel computers
An examination is given of how traditional synchronization methods influence the design of MIMD (multiple-instruction multiple-data-stream) multiprocessors. She provides an overview of MIMD
SPLASH: Stanford parallel applications for shared-memory
TLDR
This work presents the Stanford Parallel Applications for Shared-Memory (SPLASH), a set of parallel applications for use in the design and evaluation of shared-memory multiprocessing systems, and describes the applications currently in the suite in detail.
Competitive randomized algorithms for nonuniform problems
TLDR
New randomized on-line algorithms for snoopy caching and the spin-block problem are presented and achieve competitive ratios approachinge/(e−1) ≈ 1.58 against an oblivious adversary, a surprising improvement over the best possible ratio in the deterministic case.
...
...