Towards enabling I/O awareness in task-based programming models

@article{Elshazly2021TowardsEI,
  title={Towards enabling I/O awareness in task-based programming models},
  author={Hatem Elshazly and Jorge Ejarque and Francesc Lordan and Rosa M. Badia},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.01504}
}

References

SHOWING 1-10 OF 31 REFERENCES
I/O-Aware Batch Scheduling for Petascale Computing Systems
  • Zhou Zhou, Xu Yang, Z. Lan
  • Computer Science
    2015 IEEE International Conference on Cluster Computing
  • 2015
TLDR
A novel I/O-aware batch scheduling framework that can improve job performance by more than 30%, as well as increasing system performance, and design two scheduling polices with different scheduling objectives either on user-oriented metrics or system performance.
Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters
TLDR
Novel batch job scheduling techniques that reduce I/O contention for underprovisioned PFSes are proposed, which increases the amount of science performed by scientific workloads and integrates into Flux, a next-generation resource and job management framework.
Resource-Aware Task Scheduling
TLDR
A set of tools to detect resource sensitivity and predict the performance improvements that can be achieved by resource-aware scheduling, solely based on parallel execution traces and require no instrumentation or modification of the application code are provided.
Scheduling the I/O of HPC Applications Under Congestion
TLDR
This paper shows that the global I/O scheduler is able to reduce the effects of congestion, even on systems where burst buffers are used, and can increase the overall system throughput up to 56%.
On the role of burst buffers in leadership-class storage systems
TLDR
It is shown that burst buffers can accelerate the application perceived throughput to the external storage system and can reduce the amount of external storage bandwidth required to meet a desired application perceived bottleneck goal.
On the use of burst buffers for accelerating data-intensive scientific workflows
TLDR
By running a subset of the SCEC CyberShake workflow, a production seismic hazard analysis workflow, it is found that using burst buffers offers read and write improvements of about an order of magnitude, and these improvements lead to increased job performance, even for long-running CPU-bound jobs.
Characterizing output bottlenecks in a supercomputer
  • Bing Xie, J. Chase, N. Podhorszki
  • Computer Science
    2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2012
TLDR
This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer and uses a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals.
Parsl: Pervasive Parallel Programming in Python
TLDR
Parsl is a parallel scripting library that augments Python with simple, scalable, and flexible constructs for encoding parallelism that satisfy the needs of many-task, interactive, online, and machine learning applications in fields such as biology, cosmology, and materials science.
On the Viability of Compression for Reducing the Overheads of Checkpoint/Restart-Based Fault Tolerance
TLDR
It is demonstrated that checkpoint data compression is a feasible mechanism for reducing checkpoint commit latencies and storage overheads and the impact that checkpoint compression might have on future generation extreme scale systems.
...
...