UniviStor: Integrated Hierarchical and Distributed Storage for HPC

@article{Wang2018UniviStorIH,
  title={UniviStor: Integrated Hierarchical and Distributed Storage for HPC},
  author={Teng Wang and Surendra Byna and Bin Dong and Houjun Tang},
  journal={2018 IEEE International Conference on Cluster Computing (CLUSTER)},
  year={2018},
  pages={134-144}
}
  • Teng Wang, S. Byna, Houjun Tang
  • Published 1 September 2018
  • Computer Science
  • 2018 IEEE International Conference on Cluster Computing (CLUSTER)
High performance computing (HPC) architectures have been adding new layers of storage, such as burst buffers, to tolerate latency between memory and disk-based file systems. However, existing file system and burst buffer management software typically manage each storage layer separately. As a result, the burden of moving data across multiple layers falls upon HPC system users. To hide the complexity of managing the scattered storage devices from applications, we introduce UniviStor, a data… 
MLBS: Transparent Data Caching in Hierarchical Storage for Out-of-Core HPC Applications
TLDR
MultiLayered Buffer Storage (MLBS), a data object container that provides novel methods for caching and prefetching data in out-of-core scientific applications to perform asynchronously expensive I/O operations on systems equipped with hierarchical storage, is introduced.
A BeeGFS-Based Caching File System for Data-Intensive Parallel Computing
TLDR
The solution unifies data access for both the internal storage and external file systems using a uniform namespace, and improves storage performance by exploiting data locality across storage tiers, and increases data sharing between compute nodes and across applications.
I/O Acceleration via Multi-Tiered Data Buffering and Prefetching
TLDR
Her Hermes enables, manages, supervises, and, in some sense, extends I/O buffering to fully integrate into the DMSH, and introduces three novel data placement policies to efficiently utilize all layers and three novel techniques to perform memory, metadata, and communication management in hierarchical buffering systems.
ARCHIE: Data Analysis Acceleration with Array Caching in Hierarchical Storage
TLDR
A new array caching in hierarchical storage (ARCHIE) is introduced to accelerate array data analysis in a seamless fashion and shows that ARCHIE outperforms state-of-the-art file systems, i.e., Lustre and DataWarp, on a production supercomputing system by up to 5.8× in accessing data by scientific analysis applications.
HCompress: Hierarchical Data Compression for Multi-Tiered Storage Environments
TLDR
HCompress is a hierarchical data compression library that can improve the application’s performance by harmoniously leveraging both multi-tiered storage and data compression, and has been developed a novel compression selection algorithm that facilitates the optimal matching of compression libraries to the tiered storage.
Efficient Data Eviction across Multiple Tiers of Storage
TLDR
RFlush is a real-time data flushing platform for multi-tiered storage environments that allows RFlush to provide a low latency and autoscaling capabilities while also providing an efficient pipeline for continuous dataFlushing operations to enable high resource utilization.
HFetch: Hierarchical Data Prefetching for Scientific Workflows in Multi-Tiered Storage Environments
TLDR
HFetch is presented, a truly hierarchical data prefetcher that adopts a server-push approach to data prefetching that shows 10-35% performance gains over existing prefetchers and over 50% when compared to systems with noPrefetching.
Interfacing HDF5 with a scalable object‐centric storage system on hierarchical storage
TLDR
An HDF5 VOL connector that interfaces with PDC is developed and its performance is evaluated on Cori, a Cray XC40 supercomputer located at the National Energy Research Scientific Computing Center (NERSC).
Maximizing I/O Bandwidth for Reverse Time Migration on Heterogeneous Large-Scale Systems
TLDR
An extension to the Multilayer Buffer System framework to further maximize RTM I/O bandwidth in presence of GPU hardware accelerators and to leverage the GPU’s High Bandwidth Memory (HBM) as an additional storage media layer is introduced.
Design and Implementation of the Tianhe-2 Data Storage and Management System
TLDR
Light is shed on how to enable application-driven data management as a preliminary step toward the deep convergence of exascale computing ecosystems, big data, and AI.
...
1
2
...

References

SHOWING 1-10 OF 44 REFERENCES
Data Elevator: Low-Contention Data Movement in Hierarchical Storage System
  • Bin Dong, S. Byna, N. Keen
  • Computer Science
    2016 IEEE 23rd International Conference on High Performance Computing (HiPC)
  • 2016
TLDR
This paper proposes a new system, named Data Elevator, for transparently and efficiently moving data in hierarchical storage, which reduces the resource contention on BB servers via offloading the data movement from a fixed number of BB server nodes to compute nodes.
Toward Scalable and Asynchronous Object-Centric Data Management for HPC
  • Houjun Tang, S. Byna, R. Warren
  • Computer Science
    2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)
  • 2018
TLDR
This paper forms object-centric PDCs and their mappings in different levels of the storage hierarchy, named Proactive Data Containers (PDC), and achieves comparable performance with HDF5 and PLFS in reading and writing data at small scale, and outperforms them at a scale of larger than 10K cores.
Hermes: a heterogeneous-aware multi-tiered distributed I/O buffering system
TLDR
The design and implementation of Hermes is presented, which shows that, in addition to automatic data movement through the hierarchy, Hermes can significantly accelerate I/O and outperforms by more than 2x state-of-the-art buffering platforms.
FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems
TLDR
FusionFS has been deployed and evaluated on up to 16K compute nodes of an IBM Blue Gene/P supercomputer, showing more than an order of magnitude performance improvement over other popular file systems such as GPFS, PVFS, and HDFS.
MetaKV: A Key-Value Store for Metadata Management of Distributed Burst Buffers
  • Teng Wang, A. Moody, Weikuan Yu
  • Computer Science
    2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • 2017
TLDR
MetaKV is proposed: a key-value store that provides fast and scalable metadata management for HPC metadata workloads on distributed burst buffers that complements the functionality of an existing key- Value store with specialized metadata services that efficiently handle bursty and concurrent metadata workloading.
TRIO: Burst Buffer Based I/O Orchestration
TLDR
This paper proposes a burst buffer based I/O orchestration framework, named TRIO, to intercept and reshape the bursty writes for better sequential write traffic to storage servers, and demonstrates that TRIO could efficiently utilize storage bandwidth and reduce the average job I-O time by 37% on average for data-intensive applications in typical checkpointing scenarios.
Lustre : A Scalable , High-Performance File System Cluster
  • Computer Science
  • 2003
TLDR
The Lustre File System, an open source, high-performance file system from Cluster File Systems, Inc., is a distributed file system that eliminates the performance, availability, and scalability problems that are present in many traditional distributed file systems.
SSDUP: a traffic-aware ssd burst buffer for HPC systems
TLDR
This paper proposes a scheme, called SSDUP (a traffic-aware SSD burst buffer), to improve the burst buffer by addressing the above limitations and develops a novel traffic-detection method to detect the randomness in the write traffic.
A 1 PB/s file system to checkpoint three million MPI tasks
TLDR
A novel user-space file system that stores data in main memory and transparently spills over to other storage, like local flash memory or the parallel file system, as needed, which extends the reach of libraries like SCR to systems where they otherwise could not be used.
Exploiting Lustre File Joining for Effective Collective IO
TLDR
Experimental results indicate that split writing and hierarchical striping can significantly improve the performance of Lustre collective IO in terms of both data transfer and management operations.
...
1
2
3
4
5
...