Oracle-guided scheduling for controlling granularity in implicitly parallel languages*

@article{Acar2016OracleguidedSF,
  title={Oracle-guided scheduling for controlling granularity in implicitly parallel languages*},
  author={Umut A. Acar and Arthur Chargu{\'e}raud and Mike Rainey},
  journal={Journal of Functional Programming},
  year={2016},
  volume={26}
}
Abstract A classic problem in parallel computing is determining whether to execute a thread in parallel or sequentially. If small threads are executed in parallel, the overheads due to thread creation can overwhelm the benefits of parallelism, resulting in suboptimal efficiency and performance. If large threads are executed sequentially, processors may spin idle, resulting again in sub-optimal efficiency and performance. This “granularity problem” is especially important in implicitly parallel… 
Heartbeat scheduling: provable efficiency for nested parallelism
TLDR
This paper presents a scheduling technique that delivers provably efficient results for arbitrary nested-parallel programs, without the tuning needed for controlling parallelism overheads, and presents a prototype C++ implementation and an evaluation that shows that Heartbeat competes well with manually optimized Cilk Plus codes, without requiring manual tuning.
Performance challenges in modular parallel programs
TLDR
This paper considers a result from the functional-programming community, whose starting point is to consider an "oracle" that can predict the work of parallel codes, and thereby control granularity, and discusses the challenges in implementing such an oracle and proving that it has the desired theoretical properties under the nested-parallel programming model.
Poster: Performance challenges in modular parallel programs
TLDR
This paper considers a result from the functional-programming community, whose starting point is to consider an "oracle" that can predict the work of parallel codes, and thereby control granularity, and discusses the challenges in implementing such an oracle and proving that it has the desired theoretical properties under the nested-parallel programming model.
Task parallel assembly language for uncompromising parallelism
TLDR
The evaluation shows that TPAL can dramatically reduce the overheads of parallelism without compromising scalability, and is presented as an evaluation on both the Linux and the Nautilus kernels, considering a range of heartbeat interrupt mechanisms.
Provably and practically efficient granularity control
TLDR
This paper gives an algorithm for implementing an oracle and proves that it has the desired theoretical properties under the nested-parallel programming model and implements the oracle in C++ by extending Cilk and evaluates its practical performance.
Fairness in responsive parallelism
TLDR
An algorithm designed to approximate the fairly prompt scheduling principle on multicore computers is presented, implemented by extending the Standard ML language, and an empirical evaluation is presented.
Disentanglement in nested-parallel programs
TLDR
This paper identifies a memory property, called disentanglement, of nested-parallel programs, and proposes memory management techniques that take advantage of disENTanglement for improved efficiency and scalability and shows that these techniques are practical by extending the MLton compiler for Standard ML to support this form of nested parallelism.
DePa: Simple, Provably Efficient, and Practical Order Maintenance for Task Parallelism
TLDR
The proposed algorithm, called DePa, represents a computation as a graph and encodes vertices in the graph with two components: a dag-depth and a fork-path, and is work-efficient and fully parallel.
Responsive parallel computation: bridging competitive and cooperative threading
TLDR
This work extends the classic graph-based cost model for cooperative threading to allow for competitive threading, and describes how such a cost model may be used in a programming language by presenting a language and a corresponding cost semantics.
Responsive parallelism with futures and state
TLDR
To reason about the responsiveness of λi4 programs, traditional graph-based cost models for parallelism are extended to account for dependencies created via mutable state, and a type system is presented to outlaw priority inversions that can lead to unbounded blocking.
...
...

References

SHOWING 1-10 OF 88 REFERENCES
Oracle scheduling: controlling granularity in implicitly parallel languages
TLDR
It is proved that, for a class of computations, oracle scheduling can reduce task creation overheads to a small fraction of the work without adversely affecting available parallelism, thereby leading to efficient parallel executions.
Effective scheduling techniques for high-level parallel programming languages
TLDR
The starting point of this dissertation is work stealing, a scheduling policy well known for its scalable parallel performance, and the work-first principle, which serves as a guide for building efficient implementations of work stealing.
Lazy task creation: a technique for increasing the granularity of parallel programs
TLDR
This paper rejects the simpler load-based inlining method, where tasks are combined based on dynamic load level, in favor of the safer and more robust lazy task creation method, which allows efficient execution of naturally expressed algorithms of a substantially finer grain than possible with previous parallel Lisp systems.
A scheduling framework for general-purpose parallel languages
TLDR
This paper describes the scheduling framework that is designed and implemented for Manticore, a strict parallel functional language, and takes a micro-kernel approach: the compiler and runtime support a small collection of scheduling primitives upon which complex scheduling policies can be implemented.
Implicitly-threaded parallelism in Manticore
TLDR
This paper presents Manticore, a language for building parallel applications on commodity multicore hardware including a diverse collection of parallel constructs for different granularities of work, and focuses on the implicitly-threaded parallel constructs in the high-level functional language.
Space-efficient scheduling for parallel, multithreaded computations
TLDR
This dissertation presents two asynchronous scheduling algorithms that provide worst-case upper bounds on the space and time requirements of high-level, nested-parallel programs on shared memory machines and provides a user-adjustable trade-off between running time and memory requirement.
A Methodology for Granularity-Based Control of Parallelism in Logic Programs
TLDR
This paper describes a methodology whereby the granularity of parallel tasks is efficiently estimated and used to limit parallelism so that the effect of such overheads is controlled.
Harnessing the Multicores: Nested Data Parallelism in Haskell
TLDR
This talk will describe Data Parallel Haskell, which embodies nested data parallelism in a modern, general-purpose language, implemented in a state-of-the-art compiler, GHC, and will focus particularly on the vectorisation transformation, which transforms nested to flatData Parallel Haskell.
Lazy tree splitting
TLDR
This paper describes the implementation of NDP in Parallel ML (PML), part of the Manticore project, and describes LTS-based implementations of standard NDP operations, and presents experimental data demonstrating the scalability of LTS across a range of benchmarks.
Backtracking-based load balancing
TLDR
This paper proposes a "logical thread"-free framework called Tascell, which achieves a higher performance and supports a wider range of parallel environments including clusters without loss of productivity, and enables elegant and highly-efficient backtrack search algorithms with delayed workspace copying.
...
...