Adaptive Space-Shared Scheduling for Shared-Memory Parallel Programs
@inproceedings{Cho2016AdaptiveSS, title={Adaptive Space-Shared Scheduling for Shared-Memory Parallel Programs}, author={Younghyun Cho and Surim Oh and Bernhard Egger}, booktitle={JSSPP}, year={2016} }
Space-sharing is regarded as the proper resource management scheme for many-core OSes. For today’s many-core chips and parallel programming models providing no explicit resource requirements, an important research problem is to provide a proper resource allocation to the running applications while considering not only the architectural features but also the characteristics of the parallel applications.
5 Citations
Maximizing system utilization via parallelism management for co-located parallel applications
- Computer SciencePACT
- 2018
NuPoCo, a framework for automatically managing parallelism of co-located parallel applications on NUMA multi-socket multi-core systems achieves a reduction of the total turnaround time by 10-20% compared to the default Linux scheduler and an existing parallelism management policy focusing on CPU utilization only.
Chunking for Dynamic Linear Pipelines
- Computer ScienceACM Trans. Archit. Code Optim.
- 2020
The evaluation on 44 cores shows that chunking brings the overhead of dynamic scheduling down to that of a static scheduler, and it enables efficient and scalable execution of fine-grained dynamic linear pipelines.
TINS: A Task-Based Dynamic Helper Core Strategy for In Situ Analytics
- Computer ScienceSCFA
- 2018
This paper relies on a work stealing scheduler to implement TINS, a task-based in situ framework with an on-demand analytics isolation that shows up to 40% performance improvement over various other approaches including the standard helper core.
Integration of High-Performance Task-Based In Situ for Molecular Dynamics on Exascale Computers. (Développement d'un système in situ à base de tâches pour un code de dynamique moléculaire classique adapté aux machines exaflopiques)
- Computer Science
- 2018
This thesis proposes to study the design and integration of a noveltask-based in situ framework inside a task-based molecular dynamics code designed for exascale supercomputers, and takes benefit from the composability properties of the task- based programming model to implement the TINS hybrid framework.
Adaptive Scheduling of Multiprogrammed Dynamic-Multithreading Applications
- Computer ScienceJournal of Parallel and Distributed Computing
- 2022
References
SHOWING 1-10 OF 38 REFERENCES
Automatic Co-scheduling Based on Main Memory Bandwidth Usage
- Computer ScienceJSSPP
- 2016
A set of libraries and a first HPC scheduler prototype that automatically detects an application’s main memory bandwidth utilization and prevents the co-scheduling of multiple mainMemory bandwidth limited applications is presented.
Callisto: co-scheduling parallel runtime systems
- Computer ScienceEuroSys '14
- 2014
Callisto is introduced, a resource management layer for parallel runtime systems that eliminates almost all of the scheduler-related interference between concurrent jobs, while still allowing jobs to claim otherwise-idle cores.
Smart, adaptive mapping of parallelism in the presence of external workload
- Computer ScienceProceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
- 2013
This paper describes an automatic approach that combines compile-time knowledge of the program with dynamic runtime workload information to determine the best adaptive mapping of programs to available resources and delivers increased performance for the target application without penalizing the existing workload.
A Practical Approach for Performance Analysis of Shared-Memory Programs
- Computer Science2011 IEEE International Parallel & Distributed Processing Symposium
- 2011
The proposed model derives the speedup and speedup loss from data dependency and memory overhead for various configurations of threads, cores and memory access policies in UMA and NUMA systems and applies it to determine the optimal number of cores that alleviates memory contention, maximizing speed up and reducing execution time.
A workload-aware mapping approach for data-parallel programs
- Computer ScienceHiPEAC
- 2011
This paper develops an approach for predicting the optimal number of threads for a given data-parallel application in the presence of external workload and develops an alternative cooperative model that minimizes the impact on external workload while still giving an improved average speedup.
Parcae: a system for flexible parallel execution
- Computer SciencePLDI 2012
- 2012
Parcae is presented, a generally applicable automatic system for platform-wide dynamic tuning that creates flexible parallel programs whose tasks can be efficiently reconfigured during execution and outperform original parallel implementations in many interesting scenarios.
Scalability-based manycore partitioning
- Computer Science2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT)
- 2012
A sophisticated scheduler is developed that dynamically predicts the scalability of programs via the use of hardware performance monitoring units, decides the optimal number of cores to be allocated for each program, and allocates the cores to programs while maximizing the system utilization to achieve fair and maximum performance.
Parallel Job Scheduling - A Status Report
- BusinessJSSPP
- 2004
The purpose of the present paper is to update material on the scheduling of parallel jobs, and to extend it to include work concerning clusters and the grid.
The multikernel: a new OS architecture for scalable multicore systems
- Computer ScienceSOSP '09
- 2009
This work investigates a new OS structure, the multikernel, that treats the machine as a network of independent cores, assumes no inter-core sharing at the lowest level, and moves traditional OS functionality to a distributed system of processes that communicate via message-passing.
The PARSEC benchmark suite: Characterization and architectural implications
- Computer Science2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)
- 2008
This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs), and shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic.