Enhance parallel input/output with cross-bundle aggregation

  title={Enhance parallel input/output with cross-bundle aggregation},
  author={Teng Wang and Kevin Vasko and Zhuo Liu and Hui Chen and Weikuan Yu},
  journal={The International Journal of High Performance Computing Applications},
  pages={241 - 256}
  • Teng Wang, K. Vasko, Weikuan Yu
  • Published 1 May 2016
  • Computer Science
  • The International Journal of High Performance Computing Applications
The exponential growth of computing power on leadership scale computing platforms imposes grand challenge to scientific applications’ input/output (I/O) performance. To bridge the performance gap between computation and I/O, various parallel I/O libraries have been developed and adopted by computer scientists. These libraries enhance the I/O parallelism by allowing multiple processes to concurrently access the shared data set. Meanwhile, they are integrated with a set of I/O optimization… 

IOMiner: Large-Scale Analytics Framework for Gaining Knowledge from I/O Logs

IOMiner provides an easy-to-use interface for analyzing instrumentation data, a unified storage schema that hides the heterogeneity of the raw instrumentationData, and a sweep-line-based algorithm for root cause analysis of poor application I/O performance.

UniviStor: Integrated Hierarchical and Distributed Storage for HPC

UniviStor is introduced, a data management service offering a unified view of storage layers that provides performance optimizations and data structures tailored for distributed and hierarchical data placement, interferenceaware data movement scheduling, adaptive data striping, and lightweight workflow management.

Increasing the efficiency of modeling the energy characteristics of nanoclusters

Context. The paper presents a computational scheme and a number of techniques for increasing the e(cid:30)ciency of mathematical modeling of the energy characteristics of metal clusters with

Efficient Storage Design and Query Scheduling for Improving Big Data Retrieval and Analytics

By leveraging the advanced features of cutting-edge non-volatile memories, a Phase Change Memory (PCM)-based hybrid storage architecture is presented and devised, which provides efficient buffer management and novel wear leveling techniques, thus achieving highly improved data retrieval performance and at the same time solving the PCM’s bottleneck issue.



BPAR: A Bundle-Based Parallel Aggregation Framework for Decoupled I/O Execution

A Bundle-based PARallel Aggregation framework (BPAR) is proposed and three partitioning schemes under such framework that targets at improving the I/O performance of a mission-critical application GEOS-5, as well as a broad range of other scientific applications.

Scaling parallel I/O performance through I/O delegate and caching system

  • Arifa NisarW. LiaoA. Choudhary
  • Computer Science
    2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2008
A portable MPI-IO layer is proposed where certain tasks, such as file caching, consistency control, and collective I/O optimization are delegated to a small set of compute nodes, collectively termed asI/O Delegate nodes, which alleviates the lock contention at I/o servers.

ParColl: Partitioned Collective I/O on the Cray XT

  • Weikuan YuJ. Vetter
  • Computer Science
    2008 37th International Conference on Parallel Processing
  • 2008
This paper introduces a novel technique called ParColl, which augments the original two-phase collective I/O protocol with new mechanisms for file area partitioning, I-O aggregator distribution and intermediate file views, which greatly reduce the cost of global synchronization.

Scalable I/O forwarding framework for high-performance computing systems

An I/O protocol and API for shipping function calls from compute nodes to I/o nodes are described, and a quantitative analysis of the overhead associated with I-O forwarding is presented.

Profiling and Improving I/O Performance of a Large-Scale Climate Scientific Application

  • Zhuo LiuBin Wang S. Klasky
  • Computer Science
    2013 22nd International Conference on Computer Communication and Networks (ICCCN)
  • 2013
This paper adopts a mission-critical scientific application, GEOS-5, as a case to profile and analyze the communication and I/O issues that are preventing applications from fully utilizing the underlying parallel storage systems, and redesigns itsI/O framework along with a set of parallel I/W techniques to achieve high scalability and performance.

Data sieving and collective I/O in ROMIO

  • R. ThakurW. GroppE. Lusk
  • Computer Science
    Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation
  • 1999
This work describes how the MPI-IO implementation, ROMIO, delivers high performance in the presence of noncontiguous requests and explains in detail the two key optimizations ROMIO performs: data sieving for non Contiguous requests from one process and collective I/O for noncont contiguous requests from multiple processes.

A lightweight I/O scheme to facilitate spatial and temporal queries of scientific data analytics

A novel I/O scheme named STAR (Spatial and Temporal AggRegation) is proposed to enable high performance data queries for scientific analytics and is able to enable efficient data queries along the time dimension, a practice common in scientific analytics but not yet supported by existing I/ O techniques.

Locality-driven high-level I/O aggregation for processing scientific datasets

The proposed locality-driven highlevel I/O aggregation approach holds a promise for efficiently processing scientific datasets, which is critical for the data intensive or big data computing era.

Combining I/O operations for multiple array variables in parallel netCDF

This paper presents a new mechanism for PnetCDF to combine multiple I/O operations for better I-O performance, used in a new function that takes arguments for reading/writing multiple array variables, allowing application programmers to explicitly accessmultiple array variables in a single call.

Accelerating I/O Forwarding in IBM Blue Gene/P Systems

Evaluating the performance of the existing I/O forwarding mechanisms for BG/P and identifying the performance bottlenecks in the current design is evaluated, and two approaches are augmented:I/O scheduling using a work-queue model and asynchronous data staging are augmented.