LACIO: A New Collective I/O Strategy for Parallel I/O Systems

  title={LACIO: A New Collective I/O Strategy for Parallel I/O Systems},
  author={Yong Chen and Xian-he Sun and Rajeev Thakur and Philip C. Roth and William Gropp},
  journal={2011 IEEE International Parallel \& Distributed Processing Symposium},
  • Yong ChenXian-he Sun W. Gropp
  • Published 16 May 2011
  • Computer Science
  • 2011 IEEE International Parallel & Distributed Processing Symposium
Parallel applications benefit considerably from the rapid advance of processor architectures and the available massive computational capability, but their performance suffers from large latency of I/O accesses. The poor I/O performance has been attributed as a critical cause of the low sustained performance of parallel systems. Collective I/O is widely considered a critical solution that exploits the correlation among I/O accesses from multiple processes of a parallel application and optimizes… 

Figures and Tables from this paper

Iteration Based Collective I/O Strategy for Parallel I/O Systems

A new collective I/O strategy is proposed that reorganizesI/O requests within each file domain instead of coordinating requests across file domains, such that it can eliminate access contentions without introducing extra shuffle cost between aggregators and computing processes.

Revealing applications' access pattern in collective I/O for cache management

This study proposes to reveal unseen access patterns - performing collective I/O but more importantly retaining applications' access patterns to underlying cache management to address the lost access pattern of each individual process.

Hierarchical Collective I/O Scheduling for High-Performance Computing

A Transparent Collective I/O Implementation

A user-level library called transparent collective I/O (TCIO) is developed for application developers to easily incorporate collective I-O optimization into their applications and can significantly reduce the programming efforts required for applicationDevelopers.

Optimizing Performance of Parallel I/O Accesses to Non-contiguous Blocks in Multiple Array Variables

This paper adopts a multi-dataset implementation of HDF5 dataset I/O to aggregate non-contiguous requests for array blocks and provide corresponding parameter assignment strategies to reduce the overheads caused by communication straggler effects in two-phase I/o.

Improving Collective I/O Performance Using Non-volatile Memory Devices

It is demonstrated that by using local storage resources, collective write performance can be greatly improved compared to the case in which only the global parallel file system is used, but can also decrease if the ratio between aggregators and compute nodes is too small.

Heterogeneity-Aware Collective I/O for Parallel I/O Systems with Hybrid HDD/SSD Servers

This paper proposes a heterogeneity-aware collective-I/O strategy, HACIO, which reorganizes the order of I/O requests for each aggregator with awareness of the storage performance of heterogeneous servers, so that the hardware of the systems can be better utilized.

A Checkpoint of Research on Parallel I/O for High-Performance Computing

This survey article focuses on a traditional I/O stack, with a POSIX parallel file system, and aims at identifying the general characteristics of the field and the main current and future research topics.

Improving MPI Collective I/O Performance With Intra-node Request Aggregation

This work presents a new design for collective I/O by adding an extra communication layer that performs request aggregation among processes within the same compute nodes, which can significantly reduce inter-node communication congestion when redistributing theI/O requests.



Making resonance a common case: A high-performance implementation of collective I/O on parallel file systems

Resonant I/O rearranges requests from multiple MPI processes according to the presumed data layout on the disks of I/o nodes so that non-sequential access of disk data can be turned into sequential access, significantly improving I/W performance without compromising the independence of a client-based implementation.

Data sieving and collective I/O in ROMIO

  • R. ThakurW. GroppE. Lusk
  • Computer Science
    Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation
  • 1999
This work describes how the MPI-IO implementation, ROMIO, delivers high performance in the presence of noncontiguous requests and explains in detail the two key optimizations ROMIO performs: data sieving for non Contiguous requests from one process and collective I/O for noncont contiguous requests from multiple processes.

Improved parallel I/O via a two-phase run-time access strategy

This work provides experimental results and proposes a two-phase access strategy, to be implemented in a runtime system, in which the data distribution on computational nodes is decoupled from storage distribution, and shows that performance improvements of several orders of magnitude over direct access based data distribution methods can be obtained.

Collective buffering: Improving parallel I/O performance

  • B. NitzbergV. Lo
  • Computer Science
    Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183)
  • 1997
The general model of the problem is discussed, four Collective Buffering algorithms are described, and experiments show improvements of up to two order of magnitude over standard techniques and the potential to deliver peak performance with minimal hardware support.

Hiding I/O latency with pre-execution prefetching for parallel applications

It is argued that it is time to revisit the ldquoI/O wallrdquo problem and trade the excessive computing power with data-access speed and show that the pre-execution approach is promising in reducing I/O access latency and has real potential.

An Experimental Evaluation of I/O Optimizations on Different Applications

It is shown that with a limited number of I/O resources, it is possible to obtain good performance by using appropriate software optimizations, and that beyond a certain level, imbalance in the architecture results in performance degradation even when using optimized software, thereby indicating the necessity of an increase in I-O resources.

View-Based Collective I/O for MPI-IO

The evaluation section shows that view-based I/O outperforms the original two-phase collective I/W from ROMIO in most of the cases for three well-known parallel I-O benchmarks.

Lightweight I/O for Scientific Applications

It is argued that this approach allows the development of I/O libraries that are both scalable and secure, and is supported with preliminary results for a lightweight checkpoint operation on a development cluster at Sandia.

Parallel I/O in practice

This tutorial discusses parallel file systems (PFSs), covering general concepts and examine three examples: GPFS, Lustre, and PVFS2, and examines the upper layers of the I/O stack, covering four interfaces: POSIX I/o, MPI-IO, Parallel netCDF, and HDF5.

Design and evaluation of primitives for parallel I/O

The authors have devised an alternative scheme for conducting parallel I/O, the two-phase access strategy, which generates higher and more consistent performance over a wider spectrum of data distributions.