Learn More
The Standard Template Adaptive Parallel Library (stapl) is a high-productivity parallel programming framework that extends C++ and stl with unified support for shared and distributed memory parallelism. stapl provides distributed data structures (pContainers) and parallel algorithms (pAlgorithms) and a generic methodology for extending them to provide(More)
The Standard Template Adaptive Parallel Library (STAPL) is a parallel programming infrastructure that extends C++ with support for parallelism. It includes a collection of distributed data structures called pContainers that are thread-safe, concurrent objects, i.e., shared objects that provide parallel methods that can be invoked concurrently. In this work,(More)
Load balance is critical for performance in large parallel applications. An imbalance on today's fastest supercomputers can force hundreds of thousands of cores to idle, and on future exascale machines this cost will increase by over a factor of a thousand. Improving load balance requires a detailed understanding of the amount of computational load per(More)
N-body methods simulate the evolution of systems of particles (or bodies). They are critical for scientific research in fields as diverse as molecular dynamics, astrophysics, and material science. Most load balancing techniques for N-body methods use particle count to approximate computational work. This approximation is inaccurate, especially for systems(More)
We present the design and implementation of the stapl pList, a parallel container that has the properties of a sequential list, but allows for scalable concurrent access when used in a parallel program. The Standard Template Adaptive Parallel Library (stapl) is a parallel programming library that extends C++ with support for parallelism. stapl provides a(More)
Increasing architectural diversity makes performance portability extremely important for parallel simulation codes. Emerging on-node parallelization frameworks such as Kokkos and RAJA decouple the work done in kernels from the parallelization mechanism, allowing for a single source kernel to be tuned for different architectures at compile time. However,(More)
In many parallel scientific simulations, work is assigned to processors by decomposing a spatial domain consisting of mesh cells, particles, or other elements. When work per element changes, simulations can use dynamic load balance algorithms to distribute work to processors evenly. Typical SPMD simulations wait while a load balance algorithm runs on all(More)
Locating a particular radiograph from a radiographic envelope can be a frustrating and time-consuming process. The aim of this study was to evaluate the role of colour and number coding of radiographs in reducing time taken to locate films and thereby improve efficiency. The time taken by clinicians to retrieve films from a radiographic envelope was(More)
We have enabled work migration in the CoMD proxy application to study dynamic load imbalance. Proxy applications are developed to simplify studying parallel performance of scientific simulations and to test potential solutions for performance problems. However, proxy applications are typically too simple to allow work migration or to represent the load(More)