• Corpus ID: 14486805

CHAPTER Dynamic Load Balancing Using Work-Stealing 35

  title={CHAPTER Dynamic Load Balancing Using Work-Stealing 35},
  author={Daniel Cederman and Philippas Tsigas},
In this chapter, we present a methodology for efficient load balancing of computational problems that can be easily decomposed into multiple tasks, but where it is hard to predict the computation cost of each task, and where new tasks are created dynamically during runtime. We present this methodology and its exploitation and feasibility in the context of graphics processors. Work-stealing allows an idle core to acquire tasks from a core that is overloaded, causing the total work to be… 



On dynamic load balancing on graphics processors

Four different dynamic load balancing methods are compared to see which one is most suited to the highly parallel world of graphics processors and it is shown that lock-free methods achieves better performance than blocking and that they can be made to scale with increased numbers of processing units.

Scheduling multithreaded computations by work stealing

  • R. Blumofe
  • Computer Science
    Proceedings 35th Annual Symposium on Foundations of Computer Science
  • 1994
This paper gives the first provably good work-stealing scheduler for multithreaded computations with dependencies, and shows that the expected time T/sub P/ to execute a fully strict computation on P processors using this work- Stealing Scheduler is T/ Sub P/=O(T/sub 1//P+T/ sub /spl infin//), where T/ sub 1/ is the minimum serial execution time of the multith readed computation and T/

Thread Scheduling for Multiprogrammed Multiprocessors

This work presents a user-level thread scheduler for shared-memory multiprocessors, and it achieves linear speedup whenever P is small relative to the parallelism T1/T∈fty .

The art of multiprocessor programming

This talk will survey the area ofTransactional memory, a computational model in which threads synchronize by optimistic, lock-free transactions, with a focus on open research problems.

Lock-free Concurrent Data Structures

This chapter provides a sufficient background and intuition to help the interested reader to navigate in the complex research area of lock-free data structures and offers the programmer familiarity to the subject that allows using truly concurrent methods.

The Synchronization Power of Coalesced Memory Accesses

This paper investigates the synchronization power of coalesced memory accesses, a family of memory access mechanisms introduced in recent large multicore architectures such as the Compute Unified Device Architecture (CUDA).

Cilk: an efficient multithreaded runtime system

This paper shows that on real and synthetic applications, the “work” and “critical path” of a Cilk computation can be used to accurately model performance, and proves that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time and communication bounds all within a constant factor of optimal.

Cilk: an efficient multithreaded runtime

  • Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
  • 1995