An Ephemeral Burst-Buffer File System for Scientific Applications

@article{Wang2016AnEB,
  title={An Ephemeral Burst-Buffer File System for Scientific Applications},
  author={Teng Wang and Kathryn Mohror and Adam T. Moody and Kento Sato and Weikuan Yu},
  journal={SC16: International Conference for High Performance Computing, Networking, Storage and Analysis},
  year={2016},
  pages={807-818}
}
  • Teng Wang, K. Mohror, Weikuan Yu
  • Published 13 November 2016
  • Computer Science
  • SC16: International Conference for High Performance Computing, Networking, Storage and Analysis
Burst buffers are becoming an indispensable hardware resource on large-scale supercomputers to buffer the bursty I/O from scientific applications. However, there is a lack of software support for burst buffers to be efficiently shared by applications within a batch-submitted job and recycled across different batch jobs. In addition, burst buffers need to cope with a variety of challenging I/O patterns from data-intensive scientific applications. In this study, we have designed an ephemeral… 
GekkoFS — A Temporary Burst Buffer File System for HPC Applications
TLDR
GekkoFS is, therefore, able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of common parallel file systems.
MetaKV: A Key-Value Store for Metadata Management of Distributed Burst Buffers
  • Teng Wang, A. Moody, Weikuan Yu
  • Computer Science
    2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • 2017
TLDR
MetaKV is proposed: a key-value store that provides fast and scalable metadata management for HPC metadata workloads on distributed burst buffers that complements the functionality of an existing key- Value store with specialized metadata services that efficiently handle bursty and concurrent metadata workloading.
On the use of burst buffers for accelerating data-intensive scientific workflows
TLDR
By running a subset of the SCEC CyberShake workflow, a production seismic hazard analysis workflow, it is found that using burst buffers offers read and write improvements of about an order of magnitude, and these improvements lead to increased job performance, even for long-running CPU-bound jobs.
CDBB: an NVRAM-based burst buffer coordination system for parallel file systems
TLDR
Collaborative distributed burst buffer coordination system, named CDBB, coordinates all the available burst buffers, based on their priorities and states, to help overburdened burst buffers and maximize resource utilization.
Gfarm/BB — Gfarm File System for Node-Local Burst Buffer
TLDR
Gfarm/BB is a file system for a burst buffer efficiently exploiting node-local storage systems to improve the read and write performance and improves the metadata performance by omitting the persistency and the redundancy since it is a temporal file system.
Contention-Aware Resource Scheduling for Burst Buffer Systems
TLDR
This study presents a contention-aware resource scheduling (CARS) strategy that manages the burst buffer resource to coordinate concurrent jobs and demonstrates that the proposed CARS design outperforms the existing allocation strategies and improves both job performance and system utilization.
MLBS: Transparent Data Caching in Hierarchical Storage for Out-of-Core HPC Applications
TLDR
MultiLayered Buffer Storage (MLBS), a data object container that provides novel methods for caching and prefetching data in out-of-core scientific applications to perform asynchronously expensive I/O operations on systems equipped with hierarchical storage, is introduced.
Dynamic Provisioning of Storage Resources: A Case Study with Burst Buffers
TLDR
This work proposes a proof-of-concept that is able to deploy, on-demand, a parallel file-system across intermediate storage nodes on a Cray XC50 system and shows how this mechanism can be easily extended to support more data managers and any type of intermediate storage.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 33 REFERENCES
BurstFS: A Distributed Burst Buffer File System for Scientific Applications
TLDR
This study proposes BurstFS, a distributed BB file system, to exploit this architecture and provide scientific applications with high and scalable performance for bursty I/O requests.
TRIO: Burst Buffer Based I/O Orchestration
TLDR
This paper proposes a burst buffer based I/O orchestration framework, named TRIO, to intercept and reshape the bursty writes for better sequential write traffic to storage servers, and demonstrates that TRIO could efficiently utilize storage bandwidth and reduce the average job I-O time by 37% on average for data-intensive applications in typical checkpointing scenarios.
BurstMem: A high-performance burst buffer system for scientific applications
TLDR
The design of BurstMem is introduced, a high-performance burst buffer system that provides a storage framework with efficient storage and communication management strategies and is able to speed up the I/O performance of scientific applications by up to 8.5× on leadership computer systems.
On the role of burst buffers in leadership-class storage systems
TLDR
It is shown that burst buffers can accelerate the application perceived throughput to the external storage system and can reduce the amount of external storage bandwidth required to meet a desired application perceived bottleneck goal.
A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers
  • Kento Sato, K. Mohror, S. Matsuoka
  • Computer Science, Business
    2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
  • 2014
TLDR
A user-level Infini Band-based file system (IBIO) is developed that exploits the bandwidth of burst buffers, and performance models for coordinated and uncoordinated checkpoint/restart strategies are developed and applied to investigate the best checkpoint strategy using burst buffers on future large-scale systems.
Scalable Performance of the Panasas Parallel File System
TLDR
Performance measures of I/O, metadata, and recovery operations for storage clusters that range in size from 10 to 120 storage nodes, 1 to 12 metadata nodes, and with file system client counts ranging from 1 to 100 compute nodes are presented.
Exploiting Lustre File Joining for Effective Collective IO
TLDR
Experimental results indicate that split writing and hierarchical striping can significantly improve the performance of Lustre collective IO in terms of both data transfer and management operations.
PLFS: a checkpoint filesystem for parallel applications
  • J. Bent, Garth A. Gibson, M. Wingate
  • Computer Science
    Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
  • 2009
TLDR
A virtual parallel log structured file system which remaps an application's preferred data layout into one which is optimized for the underlying file system, which can reduce checkpoint time by an order of magnitude.
DASH: a Recipe for a Flash-based Data Intensive Supercomputer
TLDR
DASH achieved as much as two orders-of- magnitude speedup compared to the same applications run on traditional architectures when running data-intensive scientific applications from graph theory, biology, and astronomy.
PVFS: A Parallel File System for Linux Clusters
TLDR
The design and implementation of PVFS are described and performance results on the Chiba City cluster at Argonne are presented, both for a concurrent read/write workload and for the BTIO benchmark.
...
1
2
3
4
...