• Corpus ID: 119102244

METAQ: Bundle Supercomputing Tasks

  title={METAQ: Bundle Supercomputing Tasks},
  author={Evan Berkowitz},
  journal={arXiv: Computational Physics},
  • E. Berkowitz
  • Published 20 February 2017
  • Computer Science
  • arXiv: Computational Physics
We describe a light-weight system of bash scripts for efficiently bundling supercomputing tasks into large jobs, so that one can take advantage of incentives or discounts for requesting large allocations. The software can backfill computational tasks, avoiding wasted cycles, and can streamline collaboration between different users. It is simple to use, functioning similarly to batch systems like PBS, MOAB, and SLURM. 

Figures from this paper

Job Management with mpi_jm

The library, mpi_jm, provides a flexible Python interface, unlocking many high-level libraries, while also tightly binding users’ executables to hardware.

Autonomous Resource Management for High Performance Datacenters

A library is developed to dynamically adjust the amount of resources used throughout the lifespan of a workflow, enabling elasticity for such applications in HPC datacenters, and an adaptive controller is defined to dynamically select the best method to perform runtime state synchronizations.

Application-aware resource management for datacenters

High Performance Computing (HPC) and Cloud Computing datacenters are extensively used to steer and solve complex problems in science, engineering, and business, such as calculating correlations and

Hybrid Resource Management for HPC and Data Intensive Workloads

The architecture of a hybrid system enabling dual-level scheduling for DI jobs in HPC infrastructures is presented, allowing efficient combination of hybrid workloads on HPC resources with increased job throughput and higher overall resource utilization.

Characterizing the Performance of Executing Many-tasks on Summit

The performance of executing many tasks using RP when interfaced with JSM and PRRTE on Summit is characterized and it is found thatPRRTE scales better than JSM for > O(1000) tasks; PRR TE overheads are negligible; and PR RTE supports optimizations that lower the impact of overheads and enable resource utilization of 63% when executing O(16K), 1-core tasks over 404 compute nodes.

EspressoDB: A scientific database for managing high-performance computing workflows

The framework provided by EspressoDB aims to support the ever increasing complexity of workflows of scientific computing at leadership computing facilities, with the goal of reducing the amount of human time required to manage the jobs, thus giving scientists more time to focus on science.

An evaluation of the CORAL interconnects

An in-depth assessment of the Summit and Sierra supercomputers' network interconnects that are based on Enhanced Data Rate (EDR) 100 Gb/s Mellanox InfiniBand finds that the new Adaptive Routing dramatically improves performance but the other new features still need improvement.

Scale setting the Möbius domain wall fermion on gradient-flowed HISQ action using the omega baryon mass and the gradient-flow scales t0 and w0

We report on a subpercent scale determination using the omega baryon mass and gradient-flow methods. The calculations are performed on 22 ensembles of N f ¼ 2 þ 1 þ 1 highly improved, rooted

Scale setting the M{ö}bius Domain Wall Fermion on gradient-flowed HISQ action using the omega baryon mass and the gradient-flow scale $w_0$

We report on a sub-percent scale determination using the omega baryon mass and gradient-flow methods. The calculations are performed on 22 ensembles of $N_f=2+1+1$ highly improved, rooted staggered

Progress on Meson-Baryon Scattering

Colin Morningstar,a,∗ John Bulava, Andrew D. Hanlon, Ben Hörz, Daniel Mohler, Amy Nicholson, f Sarah Skinner and André Walker-Loud Department of Physics, Carnegie Mellon University, Pittsburgh, PA,



MPI: A message - passing interface standard

In rock drilling utilizing mechanical destruction of the rock and circulation of drilling fluid for removing debris from the cutting face, the drilling fluid is directed on to the cutting face in the

Simple Linux Utility for Resource Management

SLURM arbitrates conflicting requests for resouces by managing a queue of pending work and provides a framework for starting, executing, and monitoring work on the set of allciated nodes.

André Walker-Loud, mpi jm

  • 2017

Walker-Loud, mpi jm, in preparation

  • 2017

This work was supported in part by the Office of Science, Department of Energy, Office of Advanced Scientific Computing Research through the CalLat SciDAC3 grant under Award Number KB0301052


    • https://github.com/evanberkowitz/metaq
    • 2016