Leveraging burst buffer coordination to prevent I/O interference
@article{Kougkas2016LeveragingBB, title={Leveraging burst buffer coordination to prevent I/O interference}, author={Anthony Kougkas and Matthieu Dorier and Robert Latham and Robert B. Ross and Xian-he Sun}, journal={2016 IEEE 12th International Conference on e-Science (e-Science)}, year={2016}, pages={371-380} }
Concurrent accesses to the shared storage resources in current HPC machines lead to severe performance degradation caused by I/O contention. In this study, we identify some key challenges to efficiently handling interleaved data accesses, and we propose a system-wide solution to optimize global performance. We implemented and tested several I/O scheduling policies, including prioritizing specific applications by leveraging burst buffers to defer the conflicting accesses from another application…Â
Figures from this paper
28 Citations
Harmonia: An Interference-Aware Dynamic I/O Scheduler for Shared Non-volatile Burst Buffers
- Computer Science, Business2018 IEEE International Conference on Cluster Computing (CLUSTER)
- 2018
Harmonia is introduced, a new dynamic I/O scheduler that is aware of interference, adapts to the underlying system, implements a new 2-way decision-making process and employs several scheduling policies to maximize the system efficiency and applications' performance.
Explorations of Data Swapping on Burst Buffer
- Computer Science2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS)
- 2018
It is found that most HPC applications can still achieve full performance when using a buffer size that is far less than the total access space of the application, which can lead to a huge reduction on the required capacity for burst buffer.
BBOS: Efficient HPC Storage Management via Burst Buffer Over-Subscription
- Computer Science2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID)
- 2020
This work adopts BB over-subscription allocation method by allowing HPC applications to use BB only for I/O phase for improving BB utilization, and finds that BB utilization is improved at least 2.2x, and more stable and higher checkpoint performance is guaranteed compared to other approaches.
Mapping and scheduling HPC applications for optimizing I/O
- Computer ScienceICS
- 2020
This work proposes to couple a novel bandwidth-aware mapping algorithm to I/O list-scheduling policies to develop a cross-layer optimization solution, and shows important gains for the simple, bandwidth- aware mapping solution that it provides compared to its non bandwidth- Aware counterpart.
Toward Managing HPC Burst Buffers Effectively: Draining Strategy to Regulate Bursty I/O Behavior
- Computer Science2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)
- 2017
A proactive draining scheme to manage the draining process of distributed node-local burst buffers is proposed and an I/O provisioning model is developed to predict the minimized I/o provisioning requirement for permanent storage systems.
I/O Scheduling Strategy for Periodic Applications
- Computer ScienceTOPC
- 2019
This work shows how to take advantage of the periodic nature of HPC applications to develop efficient periodic scheduling strategies for their I/O transfers, and proves that this scheduler has the advantage of being de-centralized and thus overcoming the overhead of online schedulers, but also that it performs better than the other solutions.
Checkpointing Strategies for Shared High-Performance Computing Platforms
- Computer ScienceInt. J. Netw. Comput.
- 2019
This work considers different aspects (system-level scheduling policies and hardware) that optimize the overall performance of concurrently executing CR-based applications that share I/O resources, and shows that by combining optimal checkpointing periods with contention-aware system-level I/o scheduling strategies, this work can significantly improve overall application performance and maximize the platform throughput.
Software-defined QoS for I/O in exascale computing
- Computer ScienceCCF Trans. High Perform. Comput.
- 2019
Evaluation shows that SDQoS can effectively control the I/O bandwidth within a 5%–10% deviation and improve the performance by 20% in extreme cases.
Automatic, Application-Aware I/O Forwarding Resource Allocation
- Computer ScienceFAST
- 2019
This work implemented, evaluated, and deployed an automatic mechanism, DFRA, for application-adaptive dynamic forwarding resource allocation, which improves applications’ I/O performance by up to 18.9×, eliminates most of the interapplication I/o interference, and has saved over 200 million of core-hours during its test deployment on TaihuLight for 11 months.
Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms
- Computer Science2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
- 2018
It is shown that combining optimal checkpointing periods with I/O scheduling strategies can provide a significant improvement on the overall application performance, thereby maximizing platform throughput and minimizing the global waste on the platform.
References
SHOWING 1-10 OF 42 REFERENCES
TRIO: Burst Buffer Based I/O Orchestration
- Computer Science2015 IEEE International Conference on Cluster Computing
- 2015
This paper proposes a burst buffer based I/O orchestration framework, named TRIO, to intercept and reshape the bursty writes for better sequential write traffic to storage servers, and demonstrates that TRIO could efficiently utilize storage bandwidth and reduce the average job I-O time by 37% on average for data-intensive applications in typical checkpointing scenarios.
Scheduling the I/O of HPC Applications Under Congestion
- Computer Science2015 IEEE International Parallel and Distributed Processing Symposium
- 2015
This paper shows that the global I/O scheduler is able to reduce the effects of congestion, even on systems where burst buffers are used, and can increase the overall system throughput up to 56%.
AGIOS: Application-Guided I/O Scheduling for Parallel File Systems
- Computer Science2013 International Conference on Parallel and Distributed Systems
- 2013
This paper improves the performance of server-side I/O scheduling on parallel file systems by transparently including information about the applications' access patterns, obtained from traces generated by the scheduler itself, without changes in application or file system.
AGIOS: Application-Guided I/O Scheduling for Parallel File Systems
- Computer ScienceICPADS 2013
- 2013
This paper improves the performance of server-side I/O scheduling on parallel file systems by transparently including information about the applications' access patterns, obtained from traces generated by the scheduler itself, without changes in application or file system.
CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination
- Computer Science2014 IEEE 28th International Parallel and Distributed Processing Symposium
- 2014
Experiments show how CALCioM can be used to efficiently and transparently improve the scheduling strategy between two otherwise interfering applications, given specified metrics of machine wide efficiency.
IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination
- Computer Science2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
- 2010
This paper proposes a scheme, IOrchestrator, to improve I/O performance of multi-node storage systems by orchestratingI/O services among programs when such inter-data-server coordination is dynamically determined to be cost effective.
I/O Scheduling Service for Multi-Application Clusters
- Computer Science2006 IEEE International Conference on Cluster Computing
- 2006
This article presents an extension of the aIOLi to address the issue of disjoint accesses generated by different concurrent applications in a cluster, and proposes a new generic framework pluggable into any I/O file system layer.
On the role of burst buffers in leadership-class storage systems
- Computer Science012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)
- 2012
It is shown that burst buffers can accelerate the application perceived throughput to the external storage system and can reduce the amount of external storage bandwidth required to meet a desired application perceived bottleneck goal.
A Novel network request scheduler for a large scale storage system
- Computer ScienceComputer Science - Research and Development
- 2009
A quantum-based, Object Based Round Robin NRS algorithm is proposed that reorders the execution of I/O requests per data object, presenting a workload to backend storage that can be optimized more easily.
CA-NFS: A congestion-aware network file system
- Computer ScienceTOS
- 2009
This work develops a holistic framework for adaptively scheduling asynchronous requests in distributed file systems and implements modifications in the Congestion-Aware Network File System (CA-NFS), an extension to the ubiquitous network file system (NFS).