BurstMem: A high-performance burst buffer system for scientific applications
@article{Wang2014BurstMemAH, title={BurstMem: A high-performance burst buffer system for scientific applications}, author={Teng Wang and Sarp H. Oral and Yandong Wang and Bradley W. Settlemyer and Scott Atchley and Weikuan Yu}, journal={2014 IEEE International Conference on Big Data (Big Data)}, year={2014}, pages={71-79} }
The growth of computing power on large-scale systems requires commensurate high-bandwidth I/O systems. Many parallel file systems are designed to provide fast sustainable I/O in response to applications' soaring requirements. To meet this need, a novel system is imperative to temporarily buffer the bursty I/O and gradually flush datasets to long-term parallel file systems. In this paper, we introduce the design of BurstMem, a high-performance burst buffer system. BurstMem provides a storage…
Figures from this paper
69 Citations
Development of a Burst Buffer System for Data-Intensive Applications
- Computer ScienceArXiv
- 2015
The initial results demonstrate that utilizing a burst buffer system on top of the Lustre filesystem shows promise for dealing with the intense I/O traffic generated by application checkpointing.
An Ephemeral Burst-Buffer File System for Scientific Applications
- Computer ScienceSC16: International Conference for High Performance Computing, Networking, Storage and Analysis
- 2016
This study has designed an ephemeral Burst Buffer File System (BurstFS) that supports scalable and efficient aggregation of I/O bandwidth from burst buffers while having the same life cycle as a batch-submitted job.
Integration of Burst Buffer in High-level Parallel I/O Library for Exa-scale Computing Era
- Computer Science2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS)
- 2018
An I/O driver in PnetCDF is developed that uses a log-based format to store individual I/o requests on the burst buffer and shows that IO aggregation is a promising role for burst buffers in high-level I/ O libraries.
BurstFS: A Distributed Burst Buffer File System for Scientific Applications
- Computer Science
- 2016
This study proposes BurstFS, a distributed BB file system, to exploit this architecture and provide scientific applications with high and scalable performance for bursty I/O requests.
Managing I/O Interference in a Shared Burst Buffer System
- Computer Science2016 45th International Conference on Parallel Processing (ICPP)
- 2016
It is shown that scheduling techniques tuned to BBs can control interference and significant performance benefits can be achieved.
Accelerating a Burst Buffer Via User-Level I/O Isolation
- Computer Science2017 IEEE International Conference on Cluster Computing (CLUSTER)
- 2017
Burst buffers tolerate I/O spikes in High-Performance Computing environments by using a non-volatile flash technology. Burst buffers are commonly located between parallel file systems and compute…
On the use of burst buffers for accelerating data-intensive scientific workflows
- Computer ScienceWORKS@SC
- 2017
By running a subset of the SCEC CyberShake workflow, a production seismic hazard analysis workflow, it is found that using burst buffers offers read and write improvements of about an order of magnitude, and these improvements lead to increased job performance, even for long-running CPU-bound jobs.
Explorations of Data Swapping on Burst Buffer
- Computer Science2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS)
- 2018
It is found that most HPC applications can still achieve full performance when using a buffer size that is far less than the total access space of the application, which can lead to a huge reduction on the required capacity for burst buffer.
To share or not to share: comparing burst buffer architectures
- Business, Computer ScienceSpringSim
- 2017
These studies validate previous results indicating that storage systems without parity protection can reduce overall time to solution, and determine that shared burst buffer organizations can result in a 3.5× greater average application I/O throughput compared to local burst buffer configurations.
TRIO: Burst Buffer Based I/O Orchestration
- Computer Science2015 IEEE International Conference on Cluster Computing
- 2015
This paper proposes a burst buffer based I/O orchestration framework, named TRIO, to intercept and reshape the bursty writes for better sequential write traffic to storage servers, and demonstrates that TRIO could efficiently utilize storage bandwidth and reduce the average job I-O time by 37% on average for data-intensive applications in typical checkpointing scenarios.
References
SHOWING 1-10 OF 33 REFERENCES
On the role of burst buffers in leadership-class storage systems
- Computer Science012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)
- 2012
It is shown that burst buffers can accelerate the application perceived throughput to the external storage system and can reduce the amount of external storage bandwidth required to meet a desired application perceived bottleneck goal.
Characterizing output bottlenecks in a supercomputer
- Computer Science2012 International Conference for High Performance Computing, Networking, Storage and Analysis
- 2012
This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer and uses a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals.
Lustre : A Scalable , High-Performance File System Cluster
- Computer Science
- 2003
The Lustre File System, an open source, high-performance file system from Cluster File Systems, Inc., is a distributed file system that eliminates the performance, availability, and scalability problems that are present in many traditional distributed file systems.
Scalable I/O forwarding framework for high-performance computing systems
- Computer Science2009 IEEE International Conference on Cluster Computing and Workshops
- 2009
An I/O protocol and API for shipping function calls from compute nodes to I/o nodes are described, and a quantitative analysis of the overhead associated with I-O forwarding is presented.
Jitter-free co-processing on a prototype exascale storage stack
- Computer Science012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)
- 2012
This paper argues that the IO nodes are the appropriate location for HPC workloads and shows results from a prototype system that is built accordingly, showing that the prototype system reduces total time to completion by up to 30%.
Profiling and Improving I/O Performance of a Large-Scale Climate Scientific Application
- Computer Science2013 22nd International Conference on Computer Communication and Networks (ICCCN)
- 2013
This paper adopts a mission-critical scientific application, GEOS-5, as a case to profile and analyze the communication and I/O issues that are preventing applications from fully utilizing the underlying parallel storage systems, and redesigns itsI/O framework along with a set of parallel I/W techniques to achieve high scalability and performance.
PVFS: A Parallel File System for Linux Clusters
- Computer ScienceAnnual Linux Showcase & Conference
- 2000
The design and implementation of PVFS are described and performance results on the Chiba City cluster at Argonne are presented, both for a concurrent read/write workload and for the BTIO benchmark.
Workload characterization of a leadership class storage cluster
- Computer Science2010 5th Petascale Data Storage Workshop (PDSW '10)
- 2010
This paper characterize the scientific workloads of the world's fastest HPC (High Performance Computing) storage cluster, Spider, at the Oak Ridge Leadership Computing Facility (OLCF), and shows that the read and write I/O bandwidth usage as well as the inter-arrival time of requests can be modeled as a Pareto distribution.
GPFS: A Shared-Disk File System for Large Computing Clusters
- Computer ScienceFAST
- 2002
GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.
Scaling parallel I/O performance through I/O delegate and caching system
- Computer Science2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
- 2008
A portable MPI-IO layer is proposed where certain tasks, such as file caching, consistency control, and collective I/O optimization are delegated to a small set of compute nodes, collectively termed asI/O Delegate nodes, which alleviates the lock contention at I/o servers.