Improving I/O Performance for Exascale Applications Through Online Data Layout Reorganization

  title={Improving I/O Performance for Exascale Applications Through Online Data Layout Reorganization},
  author={Lipeng Wan and Axel Huebl and Junmin Gu and Franz Poeschel and Ana Gainaru and Ruonan Wang and Jieyang Chen and Xin Liang and Dmitry Ganyushin and Todd Munson and Ian T. Foster and Jean-Luc Vay and Norbert Podhorszki and Kesheng Wu and Scott Klasky},
  journal={IEEE Transactions on Parallel and Distributed Systems},
The applications being developed within the U.S. Exascale Computing Project (ECP) to run on imminent Exascale computers will generate scientific results with unprecedented fidelity and record turn-around time. Many of these codes are based on particle-mesh methods and use advanced algorithms, especially dynamic load-balancing and mesh-refinement, to achieve high performance on Exascale machines. Yet, as such algorithms improve parallel application efficiency, they raise new challenges for I/O… 
Transitioning from file-based HPC workflows to streaming data pipelines with openPMD and ADIOS2
New challenges in resource allocation and in the need of strategies for a flexible data distribution are determined, demonstrating their influence on efficiency and scaling on the Summit compute system.
Modeling of advanced accelerator concepts
Computer modeling is essential to research on Advanced Accelerator Concepts (AAC), as well as to their design and operation. This paper summarizes the current status and future needs of AAC systems


Using active NVRAM for I/O staging
This paper proposes a mechanism, in which each physical node has an additional active NVRAM component to stage I/O and apply simple data analytics operations over theI/O data, and experimental results show the effectiveness of the approach in addressing 'right memory sizing issue' by efficient I-O data processing.
EDO: Improving Read Performance for Scientific Applications through Elastic Data Organization
The experimental results demonstrate that EDO is able to achieve balanced data distribution across all dimensions and improve the read performance of multidimensional datasets in scientific applications.
Model-Driven Data Layout Selection for Improving Read Performance
A model-driven strategy for selecting the data layouts that benefit the performance of different read patterns is introduced and a parallel I/O model based on the striping parameters on Lustre file system and the block-level striping on RAID-based disks within an Object Storage Target (OST) of Lustre is developed.
In-situ assessment of device-side compute work for dynamic load balancing in a GPU-accelerated PIC code
Maintaining computational load balance is important to the performant behavior of codes which operate under a distributed computing model. This is especially true for GPU architectures, which can
Improving Parallel I/O Performance with Data Layout Awareness
This study proposes a data layout-aware optimization strategy to promote a better integration of the parallel I/O middleware and parallel file systems, two major components of the current parallel I-O systems, and to improve the data access performance.
TAPIOCA: An I/O Library for Optimized Topology-Aware Data Aggregation on Large-Scale Supercomputers
This paper shows how TAPIOCA can take advantage of double-buffering and one-sided communication to reduce as much as possible the idle time during data aggregation, and introduces the cost model leading to a topology-aware aggregator placement optimizing the movements of data.
Optimizing Parallel I/O Accesses through Pattern-Directed and Layout-Aware Replication
This paper proposes a pattern-directed and layout-aware data replication design, named PDLA, to improve the performance of parallel I/O systems and implements the proposed replication scheme under MPICH2 library on top of OrangeFS file system.
DataStager: scalable data staging services for petascale applications
Experimental evaluations of the flexible ‘DataStager’ framework establish both the necessity of intelligent data staging and the high performance of the approach, using the GTC fusion modeling code and benchmarks running on 1000+ processors.
Enabling dynamic file I/O path selection at runtime for parallel file system
The framework adopts a file handle-rich scheme to allow file systems choose corresponding optimizations to serve I/O requests, and consistency control algorithms are proposed to ensure data consistency while changing optimizations at runtime.
Spatially-aware Parallel I/O for Particle Data
This work proposes an adaptive aggregation technique to improve the performance of data aggregation, for both uniform and non-uniform particle distributions, and enables efficient read operations by employing a level of detail re-ordering and a multi-resolution layout.