Marco Bungart

Learn More
X10's Global Load Balancing framework GLB implements a user-level task pool for inter-place load balancing. It is based on work stealing and deploys the lifeline algorithm. A single worker per place alternates between processing tasks and answering steal requests. We have devised an efficient fault-tolerance scheme for this algorithm, improving on a(More)
Scalability postulates fault tolerance to be effective. We consider a user-level fault tolerance technique to cope with permanent node failures. It is supported by X10, one of the major Partitioned Global Address Space (PGAS) languages. In Resilient X10, an exception is thrown when a place (node) fails. This paper investigates task pools, which are often(More)
Fault tolerance is of increasing importance for parallel computing. While often addressed at system level, application-level resilience techniques may be more efficient. In particular, it seems worthwhile to provide fault tolerant libraries for reusable patterns such as the task pool. We consider a task pool variant that uses cooperative work stealing,(More)
Current HPC environments require parallel programs that are both malleable and fault-tolerant. Malleability denotes the ability to embrace system-initiated resource changes, and fault tolerance denotes the ability to cope with, e.g., permanent node failures.This paper considers the task pool pattern, specifically its lifeline-based variant. It builds on a(More)
  • 1