MetaKV: A Key-Value Store for Metadata Management of Distributed Burst Buffers

@article{Wang2017MetaKVAK,
  title={MetaKV: A Key-Value Store for Metadata Management of Distributed Burst Buffers},
  author={Teng Wang and Adam T. Moody and Yue Zhu and Kathryn Mohror and Kento Sato and Tanzima Zerin Islam and Weikuan Yu},
  journal={2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)},
  year={2017},
  pages={1174-1183}
}
  • Teng Wang, A. Moody, Weikuan Yu
  • Published 1 May 2017
  • Computer Science
  • 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Distributed burst buffers are a promising storage architecture for handling I/O workloads for exascale computing. Their aggregate storage bandwidth grows linearly with system node count. However, although scientific applications can achieve scalable write bandwidth by having each process write to its node-local burst buffer, metadata challenges remain formidable, especially for files shared across many processes. This is due to the need to track and organize file segments across the distributed… 
UniviStor: Integrated Hierarchical and Distributed Storage for HPC
TLDR
UniviStor is introduced, a data management service offering a unified view of storage layers that provides performance optimizations and data structures tailored for distributed and hierarchical data placement, interferenceaware data movement scheduling, adaptive data striping, and lightweight workflow management.
Gfarm/BB — Gfarm File System for Node-Local Burst Buffer
TLDR
Gfarm/BB is a file system for a burst buffer efficiently exploiting node-local storage systems to improve the read and write performance and improves the metadata performance by omitting the persistency and the redundancy since it is a temporal file system.
A BeeGFS-Based Caching File System for Data-Intensive Parallel Computing
TLDR
The solution unifies data access for both the internal storage and external file systems using a uniform namespace, and improves storage performance by exploiting data locality across storage tiers, and increases data sharing between compute nodes and across applications.
SoMeta: Scalable Object-Centric Metadata Management for High Performance Computing
TLDR
SoMeta is presented, a scalable and decentralized metadata management approach for object-centric storage in HPC systems that provides a flat namespace that is dynamically partitioned, a tagging approach to manage metadata that can be efficiently searched and updated, and a light-weight and fault tolerant management strategy.
ARCHIE: Data Analysis Acceleration with Array Caching in Hierarchical Storage
TLDR
A new array caching in hierarchical storage (ARCHIE) is introduced to accelerate array data analysis in a seamless fashion and shows that ARCHIE outperforms state-of-the-art file systems, i.e., Lustre and DataWarp, on a production supercomputing system by up to 5.8× in accessing data by scientific analysis applications.
Optimizing the SSD Burst Buffer by Traffic Detection
TLDR
A novel method to detect and quantify the data randomness in the write traffic is developed and an adaptive algorithm is proposed to classify the random writes dynamically, and a pipeline mechanism for the SSD buffer is proposed, in which data buffering and flushing are performed in pipeline.
Compact Filters for Fast Online Data Partitioning
TLDR
FilterKV is presented, an efficient data management scheme for fast online data partitioning of key-value (KV) pairs that reduces the total amount of data sent over the network and to storage by using a compact format to represent and store KV pointers.
Efficient User-Level Storage Disaggregation for Deep Learning
TLDR
This paper examines the I/O patterns of deep neural networks and reveals their critical need of loading many small samples randomly for successful training, and designs a specialized Deep Learning File System (DLFS) that achieves efficient user-level storage disaggregation with very little CPU utilization.
Compact Filter Structures for Fast Data Partitioning
TLDR
FilterKV is presented, a data management scheme for faster online data partitioning of key-value (KV) pair data that reduces the amount of data shuffled over the network by: (a) moving KV pairs quickly off the network to storage, and (b) using an extremely compact representation to represent each KV pair in the communication occurring over thenetwork.
Software-defined storage for fast trajectory queries using a deltaFS indexed massive directory
TLDR
This paper introduces the Indexed Massive Directory, a new technique for indexing data within DeltaFS, enabling in-situ indexing of massive amounts of data written to a single directory simultaneously, and in an arbitrarily large number of files.
...
1
2
...

References

SHOWING 1-10 OF 29 REFERENCES
IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion
TLDR
This paper introduces a middleware design called Index FS that adds support to existing file systems such as PVFS, Lustre, and HDFS for scalable high-performance operations on metadata and small files, and proposes two client-based storm free caching techniques.
An Ephemeral Burst-Buffer File System for Scientific Applications
TLDR
This study has designed an ephemeral Burst Buffer File System (BurstFS) that supports scalable and efficient aggregation of I/O bandwidth from burst buffers while having the same life cycle as a batch-submitted job.
DASH: a Recipe for a Flash-based Data Intensive Supercomputer
TLDR
DASH achieved as much as two orders-of- magnitude speedup compared to the same applications run on traditional architectures when running data-intensive scientific applications from graph theory, biology, and astronomy.
ZHT: A Light-Weight Reliable Persistent Dynamic Scalable Zero-Hop Distributed Hash Table
TLDR
This paper presents ZHT, a zero-hop distributed hash table, which has been tuned for the requirements of high-end computing systems, and compared it against other distributed hash tables and key/value stores and found it offers superior performance for the features and portability it supports.
Dynamic Metadata Management for Petabyte-Scale File Systems
TLDR
This work presents a dynamic subtree partitioning and adaptive metadata management system designed to efficiently manage hierarchical metadata workloads that evolve over time.
MDHIM: A Parallel Key/Value Framework for HPC
TLDR
An HPC specific key-value stored called the Multi-Dimensional Hierarchical Indexing Middleware (MDHIM) is built and it is found that MDHIM performance more than triples that of Cassandra on HPC systems.
IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination
  • Xuechen Zhang, K. Davis, Song Jiang
  • Computer Science
    2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2010
TLDR
This paper proposes a scheme, IOrchestrator, to improve I/O performance of multi-node storage systems by orchestratingI/O services among programs when such inter-data-server coordination is dynamically determined to be cost effective.
GIGA+: scalable directories for shared file systems
TLDR
Building scalable directories for cluster storage - i.e., directories that can store billions to trillions of entries and handle hundreds of thousands of operations per second.
PLFS: a checkpoint filesystem for parallel applications
  • J. Bent, Garth A. Gibson, M. Wingate
  • Computer Science
    Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
  • 2009
TLDR
A virtual parallel log structured file system which remaps an application's preferred data layout into one which is optimized for the underlying file system, which can reduce checkpoint time by an order of magnitude.
I/O acceleration with pattern detection
TLDR
This work develops and evaluates algorithms by which I/O patterns can be efficiently discovered and described and implements one such algorithm to reduce the metadata quantity in a virtual parallel file system by up to several orders of magnitude, thereby increasing the performance of writes and reads.
...
1
2
3
...