BurstMem: A high-performance burst buffer system for scientific applications

@article{Wang2014BurstMemAH,
  title={BurstMem: A high-performance burst buffer system for scientific applications},
  author={Teng Wang and Sarp H. Oral and Yandong Wang and Bradley W. Settlemyer and Scott Atchley and Weikuan Yu},
  journal={2014 IEEE International Conference on Big Data (Big Data)},
  year={2014},
  pages={71-79}
}
The growth of computing power on large-scale systems requires commensurate high-bandwidth I/O systems. Many parallel file systems are designed to provide fast sustainable I/O in response to applications' soaring requirements. To meet this need, a novel system is imperative to temporarily buffer the bursty I/O and gradually flush datasets to long-term parallel file systems. In this paper, we introduce the design of BurstMem, a high-performance burst buffer system. BurstMem provides a storage… 
Development of a Burst Buffer System for Data-Intensive Applications
TLDR
The initial results demonstrate that utilizing a burst buffer system on top of the Lustre filesystem shows promise for dealing with the intense I/O traffic generated by application checkpointing.
An Ephemeral Burst-Buffer File System for Scientific Applications
TLDR
This study has designed an ephemeral Burst Buffer File System (BurstFS) that supports scalable and efficient aggregation of I/O bandwidth from burst buffers while having the same life cycle as a batch-submitted job.
Integration of Burst Buffer in High-level Parallel I/O Library for Exa-scale Computing Era
  • Kai-yuan Hou, Reda Al-Bahrani, W. Liao
  • Computer Science
    2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS)
  • 2018
TLDR
An I/O driver in PnetCDF is developed that uses a log-based format to store individual I/o requests on the burst buffer and shows that IO aggregation is a promising role for burst buffers in high-level I/ O libraries.
BurstFS: A Distributed Burst Buffer File System for Scientific Applications
TLDR
This study proposes BurstFS, a distributed BB file system, to exploit this architecture and provide scientific applications with high and scalable performance for bursty I/O requests.
Managing I/O Interference in a Shared Burst Buffer System
TLDR
It is shown that scheduling techniques tuned to BBs can control interference and significant performance benefits can be achieved.
Accelerating a Burst Buffer Via User-Level I/O Isolation
Burst buffers tolerate I/O spikes in High-Performance Computing environments by using a non-volatile flash technology. Burst buffers are commonly located between parallel file systems and compute
On the use of burst buffers for accelerating data-intensive scientific workflows
TLDR
By running a subset of the SCEC CyberShake workflow, a production seismic hazard analysis workflow, it is found that using burst buffers offers read and write improvements of about an order of magnitude, and these improvements lead to increased job performance, even for long-running CPU-bound jobs.
Explorations of Data Swapping on Burst Buffer
  • T. Xu, Kento Sato, S. Matsuoka
  • Computer Science
    2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS)
  • 2018
TLDR
It is found that most HPC applications can still achieve full performance when using a buffer size that is far less than the total access space of the application, which can lead to a huge reduction on the required capacity for burst buffer.
To share or not to share: comparing burst buffer architectures
TLDR
These studies validate previous results indicating that storage systems without parity protection can reduce overall time to solution, and determine that shared burst buffer organizations can result in a 3.5× greater average application I/O throughput compared to local burst buffer configurations.
TRIO: Burst Buffer Based I/O Orchestration
TLDR
This paper proposes a burst buffer based I/O orchestration framework, named TRIO, to intercept and reshape the bursty writes for better sequential write traffic to storage servers, and demonstrates that TRIO could efficiently utilize storage bandwidth and reduce the average job I-O time by 37% on average for data-intensive applications in typical checkpointing scenarios.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 33 REFERENCES
On the role of burst buffers in leadership-class storage systems
TLDR
It is shown that burst buffers can accelerate the application perceived throughput to the external storage system and can reduce the amount of external storage bandwidth required to meet a desired application perceived bottleneck goal.
Characterizing output bottlenecks in a supercomputer
  • Bing Xie, J. Chase, N. Podhorszki
  • Computer Science
    2012 International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2012
TLDR
This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer and uses a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals.
Lustre : A Scalable , High-Performance File System Cluster
  • Computer Science
  • 2003
TLDR
The Lustre File System, an open source, high-performance file system from Cluster File Systems, Inc., is a distributed file system that eliminates the performance, availability, and scalability problems that are present in many traditional distributed file systems.
Scalable I/O forwarding framework for high-performance computing systems
TLDR
An I/O protocol and API for shipping function calls from compute nodes to I/o nodes are described, and a quantitative analysis of the overhead associated with I-O forwarding is presented.
Jitter-free co-processing on a prototype exascale storage stack
TLDR
This paper argues that the IO nodes are the appropriate location for HPC workloads and shows results from a prototype system that is built accordingly, showing that the prototype system reduces total time to completion by up to 30%.
Profiling and Improving I/O Performance of a Large-Scale Climate Scientific Application
  • Zhuo Liu, Bin Wang, S. Klasky
  • Computer Science
    2013 22nd International Conference on Computer Communication and Networks (ICCCN)
  • 2013
TLDR
This paper adopts a mission-critical scientific application, GEOS-5, as a case to profile and analyze the communication and I/O issues that are preventing applications from fully utilizing the underlying parallel storage systems, and redesigns itsI/O framework along with a set of parallel I/W techniques to achieve high scalability and performance.
PVFS: A Parallel File System for Linux Clusters
TLDR
The design and implementation of PVFS are described and performance results on the Chiba City cluster at Argonne are presented, both for a concurrent read/write workload and for the BTIO benchmark.
Workload characterization of a leadership class storage cluster
TLDR
This paper characterize the scientific workloads of the world's fastest HPC (High Performance Computing) storage cluster, Spider, at the Oak Ridge Leadership Computing Facility (OLCF), and shows that the read and write I/O bandwidth usage as well as the inter-arrival time of requests can be modeled as a Pareto distribution.
GPFS: A Shared-Disk File System for Large Computing Clusters
TLDR
GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.
Scaling parallel I/O performance through I/O delegate and caching system
  • Arifa Nisar, W. Liao, A. Choudhary
  • Computer Science
    2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2008
TLDR
A portable MPI-IO layer is proposed where certain tasks, such as file caching, consistency control, and collective I/O optimization are delegated to a small set of compute nodes, collectively termed asI/O Delegate nodes, which alleviates the lock contention at I/o servers.
...
1
2
3
4
...