Adaptive Space-Shared Scheduling for Shared-Memory Parallel Programs

  title={Adaptive Space-Shared Scheduling for Shared-Memory Parallel Programs},
  author={Younghyun Cho and Surim Oh and Bernhard Egger},
Space-sharing is regarded as the proper resource management scheme for many-core OSes. For today’s many-core chips and parallel programming models providing no explicit resource requirements, an important research problem is to provide a proper resource allocation to the running applications while considering not only the architectural features but also the characteristics of the parallel applications. 
Maximizing system utilization via parallelism management for co-located parallel applications
NuPoCo, a framework for automatically managing parallelism of co-located parallel applications on NUMA multi-socket multi-core systems achieves a reduction of the total turnaround time by 10-20% compared to the default Linux scheduler and an existing parallelism management policy focusing on CPU utilization only.
Chunking for Dynamic Linear Pipelines
The evaluation on 44 cores shows that chunking brings the overhead of dynamic scheduling down to that of a static scheduler, and it enables efficient and scalable execution of fine-grained dynamic linear pipelines.
TINS: A Task-Based Dynamic Helper Core Strategy for In Situ Analytics
This paper relies on a work stealing scheduler to implement TINS, a task-based in situ framework with an on-demand analytics isolation that shows up to 40% performance improvement over various other approaches including the standard helper core.
Integration of High-Performance Task-Based In Situ for Molecular Dynamics on Exascale Computers. (Développement d'un système in situ à base de tâches pour un code de dynamique moléculaire classique adapté aux machines exaflopiques)
This thesis proposes to study the design and integration of a noveltask-based in situ framework inside a task-based molecular dynamics code designed for exascale supercomputers, and takes benefit from the composability properties of the task- based programming model to implement the TINS hybrid framework.
Adaptive Scheduling of Multiprogrammed Dynamic-Multithreading Applications


Automatic Co-scheduling Based on Main Memory Bandwidth Usage
A set of libraries and a first HPC scheduler prototype that automatically detects an application’s main memory bandwidth utilization and prevents the co-scheduling of multiple mainMemory bandwidth limited applications is presented.
Callisto: co-scheduling parallel runtime systems
Callisto is introduced, a resource management layer for parallel runtime systems that eliminates almost all of the scheduler-related interference between concurrent jobs, while still allowing jobs to claim otherwise-idle cores.
Smart, adaptive mapping of parallelism in the presence of external workload
  • M. Emani, Zheng Wang, M. O’Boyle
  • Computer Science
    Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
  • 2013
This paper describes an automatic approach that combines compile-time knowledge of the program with dynamic runtime workload information to determine the best adaptive mapping of programs to available resources and delivers increased performance for the target application without penalizing the existing workload.
A Practical Approach for Performance Analysis of Shared-Memory Programs
  • B. Tudor, Y. M. Teo
  • Computer Science
    2011 IEEE International Parallel & Distributed Processing Symposium
  • 2011
The proposed model derives the speedup and speedup loss from data dependency and memory overhead for various configurations of threads, cores and memory access policies in UMA and NUMA systems and applies it to determine the optimal number of cores that alleviates memory contention, maximizing speed up and reducing execution time.
A workload-aware mapping approach for data-parallel programs
This paper develops an approach for predicting the optimal number of threads for a given data-parallel application in the presence of external workload and develops an alternative cooperative model that minimizes the impact on external workload while still giving an improved average speedup.
Parcae: a system for flexible parallel execution
Parcae is presented, a generally applicable automatic system for platform-wide dynamic tuning that creates flexible parallel programs whose tasks can be efficiently reconfigured during execution and outperform original parallel implementations in many interesting scenarios.
Scalability-based manycore partitioning
A sophisticated scheduler is developed that dynamically predicts the scalability of programs via the use of hardware performance monitoring units, decides the optimal number of cores to be allocated for each program, and allocates the cores to programs while maximizing the system utilization to achieve fair and maximum performance.
Parallel Job Scheduling - A Status Report
The purpose of the present paper is to update material on the scheduling of parallel jobs, and to extend it to include work concerning clusters and the grid.
The multikernel: a new OS architecture for scalable multicore systems
This work investigates a new OS structure, the multikernel, that treats the machine as a network of independent cores, assumes no inter-core sharing at the lowest level, and moves traditional OS functionality to a distributed system of processes that communicate via message-passing.
The PARSEC benchmark suite: Characterization and architectural implications
This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs), and shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic.