Corpus ID: 237572290

Job Scheduling in High Performance Computing

@article{Fan2021JobSI,
  title={Job Scheduling in High Performance Computing},
  author={Yuping Fan},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.09269}
}
  • Yuping Fan
  • Published 20 September 2021
  • Computer Science
  • ArXiv
The ever-growing processing power of supercomputers in recent decades enables us to explore increasing complex scientific problems. Effective scheduling these jobs is crucial for individual job performance and system efficiency. The traditional job schedulers in high performance computing (HPC) are simple and concentrate on improving CPU utilization. The emergence of new hardware resources and novel hardware structure impose severe challenges on traditional schedulers. The increasing diverse… Expand

References

SHOWING 1-10 OF 48 REFERENCES
Scheduling Beyond CPUs for HPC
TLDR
This study presents a multi-resource scheduling scheme named BBSched that schedules user jobs based on not only their CPU requirements, but also other schedulable resources such as burst buffer, indicating that explicitly optimizing multiple resources beyond CPUs is essential for HPC scheduling. Expand
ROME: A Multi-Resource Job Scheduling Framework for Exascale HPC Systems
TLDR
This paper proposes ROME, a novel multi-dimensional job scheduling framework to explore potential tradeoffs among multiple resources and provides balanced scheduling decision that leverages genetic algorithm as the multi- dimensional optimization engine to generate fast scheduling decision and to support effective resource utilization. Expand
Reducing Energy Costs for IBM Blue Gene/P via Power-Aware Job Scheduling
TLDR
Experiments show that the proposed power-aware job scheduling approach for HPC systems based on variable energy prices and job power profiles can reduce the energy cost significantly, up to 25 %, with only slight impact on system utilization. Expand
Hybrid Workload Scheduling on HPC Systems
TLDR
This study presents several scheduling mechanisms to address the issues involved in co-scheduling on-demand, rigid, and malleable jobs on a single HPC system, and extensively evaluate and compare their performance under various configurations and workloads. Expand
Flux: A Next-Generation Resource Management Framework for Large HPC Centers
TLDR
This paper details the design of Flux and describes and evaluates the initial prototyping effort of the key run-time components, showing that the run- time prototype provides strong and predictable scalability. Expand
Energy-aware job scheduler for high-performance computing
TLDR
This work presents an energy-aware scheduler that can be applied to a HPC data center without any changes in hardware and indicates that the scheduler is able to reduce the energy consumption by 6–16% depending on the job workload. Expand
Multi-Resource Packing for Cluster Schedulers
Tasks in modern data-parallel clusters have highly diverse resource requirements alongCPU,memory, disk and network. WepresentTetris, amulti-resource cluster scheduler that packs tasks to machinesExpand
GPU Age-Aware Scheduling to Improve the Reliability of Leadership Jobs on Titan
TLDR
This work has designed techniques to increase the use of low-failure GPUs in leadership jobs through targeted resource allocation, and employed two complementary techniques, updating both the system ordering and the allocation mechanisms. Expand
Practical Resource Management in Power-Constrained, High Performance Computing
TLDR
This paper proposes RMAP, a practical, low-overhead resource manager targeted at future power-constrained clusters, and designs and analyzes an adaptive policy, which derives job-level power bounds in a fair-share manner and supports overprovisioning and power-aware backfilling. Expand
Scheduling Batch and Heterogeneous Jobs with Runtime Elasticity in a Parallel Processing Environment
TLDR
This paper proposes Delayed-LOS and Hybrid-LOS, two novel scheduling algorithms that improve and build on an existing Dynamic Programming based scheduler (LOS) designed only for batch jobs that incorporate runtime elasticity as well. Expand
...
1
2
3
4
5
...