Multi-queues can be state-of-the-art priority schedulers

  title={Multi-queues can be state-of-the-art priority schedulers},
  author={Anastasiia Postnikova and Nikita Koval and Giorgi Nadiradze and Dan Alistarh},
  journal={Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
  • A. Postnikova, N. Koval, Dan Alistarh
  • Published 2 September 2021
  • Computer Science
  • Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Designing and implementing efficient parallel priority schedulers is an active research area. An intriguing proposed design is the Multi-Queue: given n threads and m ≥ n distinct priority queues, task insertions are performed uniformly at random, while, to delete, a thread picks two queues uniformly at random, and removes the observed task of higher priority. This approach scales well, and has probabilistic rank guarantees: roughly, the rank of each task removed, relative to remaining tasks in… 
A scalable architecture for reprioritizing ordered parallelism
Hive is presented, a task-based execution model and multicore architecture that extracts abundant fine-grain parallelism from algorithms with priority updates, while retaining their strict priority schedules.


The SprayList: a scalable relaxed priority queue
The SprayList is presented, a scalable priority queue with relaxed ordering semantics that is comparable to a classic unordered SkipList, and it is proved that the running time of a DeleteMin operation is O(log^3 p), with high probability, independent of the size of the list.
The Power of Choice in Priority Scheduling
The analytic results inspire a new concurrent priority queue implementation, which improves upon the state of the art in terms of practical performance, and is based on a new technical connection between "heavily loaded" balls-into-bins processes and priority scheduling.
Lock-Free Algorithms under Stochastic Schedulers
This work considers the following random process, motivated by the analysis of lock-free concurrent algorithms under high memory contention, and provides asymptotically tight bounds for the system and individual latency of this general concurrency pattern p.
Relaxed Schedulers Can Efficiently Parallelize Iterative Algorithms
This work presents an efficient method to deterministically parallelize iterative sequential algorithms, with provable runtime guarantees in terms of the number of executed tasks to completion.
Understanding priority-based scheduling of graph algorithms on a shared-memory platform
This paper performs a detailed empirical performance analysis of several advanced CPS designs in a state-of-the-art graph analytics framework and develops PMOD, a new CPS that is robust and delivers the highest performance overall.
The lock-free k-LSM relaxed priority queue
We present a new, concurrent, lock-free priority queue that relaxes the delete-min operation to allow deletion of any of the ρ smallest keys instead of only a minimal one, where ρ is a parameter that
Efficiency Guarantees for Parallel Incremental Algorithms under Relaxed Schedulers
This paper analyzes the efficiency guarantees provided by a range of incremental algorithms when parallelized via relaxed schedulers, and provides lower bounds showing that certain algorithms will inherently incur a non-trivial amount of wasted work due to scheduler relaxation.
SAM: Optimizing Multithreaded Cores for Speculative Parallelism
This work presents speculation-aware multithreading (SAM), a simple policy that addresses major performance pathologies of speculative parallelism by coordinating instruction dispatch and conflict resolution priorities and makes multithreaded cores much more beneficial on speculative parallel programs.
Distributionally Linearizable Data Structures
This work shows for the first time that, under a set of analytic assumptions, a family of relaxed concurrent data structures, including variants of MultiQueues, but also a new approximate counting algorithm called the MultiCounter, provides strong probabilistic guarantees on the degree of relaxation with respect to the sequential specification, in arbitrary concurrent executions.
Cilk: an efficient multithreaded runtime system
This paper shows that on real and synthetic applications, the “work” and “critical path” of a Cilk computation can be used to accurately model performance, and proves that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time and communication bounds all within a constant factor of optimal.