METAQ: Bundle Supercomputing Tasks
@article{Berkowitz2017METAQBS, title={METAQ: Bundle Supercomputing Tasks}, author={Evan Berkowitz}, journal={arXiv: Computational Physics}, year={2017} }
We describe a light-weight system of bash scripts for efficiently bundling supercomputing tasks into large jobs, so that one can take advantage of incentives or discounts for requesting large allocations. The software can backfill computational tasks, avoiding wasted cycles, and can streamline collaboration between different users. It is simple to use, functioning similarly to batch systems like PBS, MOAB, and SLURM.
Figures from this paper
17 Citations
Three practical workflow schedulers for easy maximum parallelism
- Computer ScienceSoftw. Pract. Exp.
- 2023
This work presents a complete characterization of the minimum effective task granularity for efficient scheduler usage scenarios, including simplicity of design, suitability for HPC centers, short startup time, and well‐understood per‐task overhead.
Job Management with mpi_jm
- Computer ScienceISC Workshops
- 2018
The library, mpi_jm, provides a flexible Python interface, unlocking many high-level libraries, while also tightly binding users’ executables to hardware.
Autonomous Resource Management for High Performance Datacenters
- Computer Science
- 2020
A library is developed to dynamically adjust the amount of resources used throughout the lifespan of a workflow, enabling elasticity for such applications in HPC datacenters, and an adaptive controller is defined to dynamically select the best method to perform runtime state synchronizations.
Application-aware resource management for datacenters
- Computer Science
- 2018
High Performance Computing (HPC) and Cloud Computing datacenters are extensively used to steer and solve complex problems in science, engineering, and business, such as calculating correlations and…
Hybrid Resource Management for HPC and Data Intensive Workloads
- Computer Science2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)
- 2019
The architecture of a hybrid system enabling dual-level scheduling for DI jobs in HPC infrastructures is presented, allowing efficient combination of hybrid workloads on HPC resources with increased job throughput and higher overall resource utilization.
Characterizing the Performance of Executing Many-tasks on Summit
- Computer Science2019 IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)
- 2019
The performance of executing many tasks using RP when interfaced with JSM and PRRTE on Summit is characterized and it is found thatPRRTE scales better than JSM for > O(1000) tasks; PRR TE overheads are negligible; and PR RTE supports optimizations that lower the impact of overheads and enable resource utilization of 63% when executing O(16K), 1-core tasks over 404 compute nodes.
EspressoDB: A scientific database for managing high-performance computing workflows
- Computer ScienceJ. Open Source Softw.
- 2020
The framework provided by EspressoDB aims to support the ever increasing complexity of workflows of scientific computing at leadership computing facilities, with the goal of reducing the amount of human time required to manage the jobs, thus giving scientists more time to focus on science.
Simulating the Weak Death of the Neutron in a Femtoscale Universe with Near-Exascale Computing
- PhysicsSC18: International Conference for High Performance Computing, Networking, Storage and Analysis
- 2018
An improved algorithm that expoentially decreases the time-to-solution and an optimal application mapping through a job manager, which allows CPU and GPU jobs to be interleaved, yielding 15% of peak performance when deployed across large fractions of CORAL.
Scale setting the Möbius domain wall fermion on gradient-flowed HISQ action using the omega baryon mass and the gradient-flow scales t0 and w0
- PhysicsPhysical Review D
- 2021
We report on a subpercent scale determination using the omega baryon mass and gradient-flow methods. The calculations are performed on 22 ensembles of N f ¼ 2 þ 1 þ 1 highly improved, rooted…
Scale setting the M{ö}bius Domain Wall Fermion on gradient-flowed HISQ action using the omega baryon mass and the gradient-flow scale $w_0$
- Physics
- 2020
We report on a sub-percent scale determination using the omega baryon mass and gradient-flow methods. The calculations are performed on 22 ensembles of $N_f=2+1+1$ highly improved, rooted staggered…
References
SHOWING 1-6 OF 6 REFERENCES
Simple Linux Utility for Resource Management
- Computer Science
- 2009
SLURM arbitrates conflicting requests for resouces by managing a queue of pending work and provides a framework for starting, executing, and monitoring work on the set of allciated nodes.
MPI: A message - passing interface standard
- Materials Science
- 1994
In rock drilling utilizing mechanical destruction of the rock and circulation of drilling fluid for removing debris from the cutting face, the drilling fluid is directed on to the cutting face in the…
André Walker-Loud, mpi jm
- 2017
Walker-Loud, mpi jm, in preparation
- 2017
This work was supported in part by the Office of Science, Department of Energy, Office of Advanced Scientific Computing Research through the CalLat SciDAC3 grant under Award Number KB0301052
METAQ
- https://github.com/evanberkowitz/metaq
- 2016