Hybrid hierarchy storage system in MilkyWay-2 supercomputer

  title={Hybrid hierarchy storage system in MilkyWay-2 supercomputer},
  author={Weixia Xu and Yutong Lu and Qiong Li and Enqiang Zhou and Zhenlong Song and Yong Dong and Wei Zhang and Dengping Wei and Xiaoming Zhang and Haitao Chen and Jianying Xing and Yuan Yuan},
  journal={Frontiers of Computer Science},
With the rapid improvement of computation capability in high performance supercomputer system, the imbalance of performance between computation subsystem and storage subsystem has become more and more serious, especially when various big data are produced ranging from tens of gigabytes up to terabytes. To reduce this gap, large-scale storage systems need to be designed and implemented with high performance and scalability.MilkyWay-2 (TH-2) supercomputer system with peak performance 54.9 Pflops… 

Achieving High Reliability and Efficiency in Maintaining Large-Scale Storage Systems through Optimal Resource Provisioning and Data Placement

This dissertation proposes a holistic algorithm which can adaptively predict the popularity of data objects by leveraging the temporal locality in their access patterns and adjust their placement among solid-state drives and regular hard disk drives so that the data access throughput as well as the storage space efficiency of the large-scale heterogeneous storage systems can be improved.

WatCache: a workload-aware temporary cache on the compute side of HPC systems

This paper designed a workload-aware node allocation method to allocate fast storage devices to jobs according to their I/O requirements and merged the devices of the jobs into separate temporary cache spaces, and implemented a metadata caching strategy that reduces the metadata overhead ofI/O requests to improve the performance of small I/o.

Memory-Efficient and Skew-Tolerant MapReduce Over MPI for Supercomputing Systems

Data analytics has become an integral part of large-scale scientific computing. Among various data analytics frameworks, MapReduce has gained the most traction. Although some efforts have been made

Design and Implementation of the Tianhe-2 Data Storage and Management System

Light is shed on how to enable application-driven data management as a preliminary step toward the deep convergence of exascale computing ecosystems, big data, and AI.

An Analysis of System Balance and Architectural Trends Based on Top500 Supercomputers

Analysis of balance ratios and architectural trends in the world’s most powerful supercomputers between 1993 and 2019 reveals that balance ratios of the various subsystems need to be considered carefully alongside the application workload portfolio to provision the subsystem capacity and bandwidth specifications, which can help achieve optimal performance.

Erasure code of small file in a distributed file system

  • Xinhai ChenJie LiuP. Xie
  • Computer Science
    2017 3rd IEEE International Conference on Computer and Communications (ICCC)
  • 2017
This work provides a kind of distributed file system to store large amounts of small files, and introduces the technique of erasure code which is an alternative offers the same data protection but reduces significantly the storage consumption.

A Checkpoint of Research on Parallel I/O for High-Performance Computing

This survey article focuses on a traditional I/O stack, with a POSIX parallel file system, and aims at identifying the general characteristics of the field and the main current and future research topics.

Improving I/O performance for High Performance Computing with Application Forwarding Layer

This paper presents an approach that uses dedicated computing nodes to process the requests from different applications to optimize the I/O performance, and calls the role played by these computing nodes the logical I/W forwarding layer which corresponds to theI/O forwarding layer of the physical structure.

End-to-end I/O Monitoring on a Leading Supercomputer

Beacon, an end-to-end I/O resource monitoring and diagnosis system for the 40960-node Sunway TaihuLight supercomputer, has successfully helped center administrators identify obscure design or configuration flaws, system anomaly occurrences, I-O performance interference, and resource underor over-provisioning problems.



Scalable Performance of the Panasas Parallel File System

Performance measures of I/O, metadata, and recovery operations for storage clusters that range in size from 10 to 120 storage nodes, 1 to 12 metadata nodes, and with file system client counts ranging from 1 to 100 compute nodes are presented.

On the role of burst buffers in leadership-class storage systems

It is shown that burst buffers can accelerate the application perceived throughput to the external storage system and can reduce the amount of external storage bandwidth required to meet a desired application perceived bottleneck goal.

Lustre : A Scalable , High-Performance File System Cluster

  • Computer Science
  • 2003
The Lustre File System, an open source, high-performance file system from Cluster File Systems, Inc., is a distributed file system that eliminates the performance, availability, and scalability problems that are present in many traditional distributed file systems.

The parallel I/O architecture of the high-performance storage system (HPSS)

  • R. WatsonR. Coyne
  • Computer Science
    Proceedings of IEEE 14th Symposium on Mass Storage Systems
  • 1995
This paper describes the parallel I/O architecture and mechanisms, parallel transport protocol (PTP), parallel FTP, and parallel client application programming interface (API) used by the high-performance storage system (HPSS).

File Creation Strategies in a Distributed Metadata File System

This paper presents designs that are able to reduce the message complexity of the create operation and increase performance, and compared to the basecase create protocol implemented in PVFS, the design delivers near constant operation latency as the system scales, does not degenerate under high contention situations, and increases throughput linearly as the number of metadata servers increase.

Adaptive and scalable metadata management to support a trillion files

This work presents a scalable and adaptive metadata management system which aims to maintain a trillion files efficiently and exploits an adaptive two-level directory partitioning based on extendible hashing to manage very large directories.

Using server-to-server communication in parallel file systems to simplify consistency and improve performance

  • P. CarnsB. SettlemyerW. Ligon
  • Computer Science
    2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2008
The results indicate that collective communication is an effective scheme for simplifying consistency checks and significantly improving the performance for several real metadata intensive workloads.

Small-file access in parallel file systems

This paper describes five techniques for optimizing small-file access in parallel file systems for very large scale systems, implemented in a single parallel file system (PVFS) and then systematically assessed on two test platforms.

Scalable I/O forwarding framework for high-performance computing systems

An I/O protocol and API for shipping function calls from compute nodes to I/o nodes are described, and a quantitative analysis of the overhead associated with I-O forwarding is presented.

Managing Variability in the IO Performance of Petascale Storage Systems

  • J. LofsteadF. Zheng M. Wolf
  • Computer Science
    2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2010
These measurements motivate developing a 'managed' IO approach using adaptive algorithms varying the IO system workload based on current levels and use areas, which achieves higher overall performance and less variability in both a typical usage environment and with artificially introduced levels of 'noise'.