BPAR: A Bundle-Based Parallel Aggregation Framework for Decoupled I/O Execution

  title={BPAR: A Bundle-Based Parallel Aggregation Framework for Decoupled I/O Execution},
  author={Teng Wang and Kevin Vasko and Zhuo Liu and Hui Chen and Weikuan Yu},
  journal={2014 International Workshop on Data Intensive Scalable Computing Systems},
  • Teng Wang, K. Vasko, Weikuan Yu
  • Published 16 November 2014
  • Computer Science
  • 2014 International Workshop on Data Intensive Scalable Computing Systems
In today's "Big Data" era, developers have adopted I/O techniques such as MPI-IO, Parallel NetCDF and HDF5 to garner enough performance to manage the vast amount of data that scientific applications require. These I/O techniques offer parallel access to shared datasets and together with a set of optimizations such as data sieving and two-phase I/O to boost I/O throughput. While most of these techniques focus on optimizing the access pattern on a single file or file extent, few of these… 

Figures and Tables from this paper

Enhance parallel input/output with cross-bundle aggregation

The experiment result reveals that BPAR can deliver 2.1× I/O performance improvement over the baseline GEOS-5, and it is very promising in accelerating scientific applications’ I/o performance on various computing platforms.

UniviStor: Integrated Hierarchical and Distributed Storage for HPC

UniviStor is introduced, a data management service offering a unified view of storage layers that provides performance optimizations and data structures tailored for distributed and hierarchical data placement, interferenceaware data movement scheduling, adaptive data striping, and lightweight workflow management.

TRIO: Burst Buffer Based I/O Orchestration

This paper proposes a burst buffer based I/O orchestration framework, named TRIO, to intercept and reshape the bursty writes for better sequential write traffic to storage servers, and demonstrates that TRIO could efficiently utilize storage bandwidth and reduce the average job I-O time by 37% on average for data-intensive applications in typical checkpointing scenarios.

Improving the Performance of Heterogeneous Hadoop Clusters Using Map Reduce

This paper intends to discover the adequacy of new calculation, correlation, proposals, and an aggressive way to deal with discover the best answer for enhancing the big data situation to enhance the proficiency of Hadoop bunches in putting away and dissecting big data.

Efficient Storage Design and Query Scheduling for Improving Big Data Retrieval and Analytics

By leveraging the advanced features of cutting-edge non-volatile memories, a Phase Change Memory (PCM)-based hybrid storage architecture is presented and devised, which provides efficient buffer management and novel wear leveling techniques, thus achieving highly improved data retrieval performance and at the same time solving the PCM’s bottleneck issue.



Profiling and Improving I/O Performance of a Large-Scale Climate Scientific Application

  • Zhuo LiuBin Wang S. Klasky
  • Computer Science
    2013 22nd International Conference on Computer Communication and Networks (ICCCN)
  • 2013
This paper adopts a mission-critical scientific application, GEOS-5, as a case to profile and analyze the communication and I/O issues that are preventing applications from fully utilizing the underlying parallel storage systems, and redesigns itsI/O framework along with a set of parallel I/W techniques to achieve high scalability and performance.

ParColl: Partitioned Collective I/O on the Cray XT

  • Weikuan YuJ. Vetter
  • Computer Science
    2008 37th International Conference on Parallel Processing
  • 2008
This paper introduces a novel technique called ParColl, which augments the original two-phase collective I/O protocol with new mechanisms for file area partitioning, I-O aggregator distribution and intermediate file views, which greatly reduce the cost of global synchronization.

Locality-driven high-level I/O aggregation for processing scientific datasets

The proposed locality-driven highlevel I/O aggregation approach holds a promise for efficiently processing scientific datasets, which is critical for the data intensive or big data computing era.

A lightweight I/O scheme to facilitate spatial and temporal queries of scientific data analytics

A novel I/O scheme named STAR (Spatial and Temporal AggRegation) is proposed to enable high performance data queries for scientific analytics and is able to enable efficient data queries along the time dimension, a practice common in scientific analytics but not yet supported by existing I/ O techniques.

Scaling parallel I/O performance through I/O delegate and caching system

  • Arifa NisarW. LiaoA. Choudhary
  • Computer Science
    2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
  • 2008
A portable MPI-IO layer is proposed where certain tasks, such as file caching, consistency control, and collective I/O optimization are delegated to a small set of compute nodes, collectively termed asI/O Delegate nodes, which alleviates the lock contention at I/o servers.

Combining I/O operations for multiple array variables in parallel netCDF

This paper presents a new mechanism for PnetCDF to combine multiple I/O operations for better I-O performance, used in a new function that takes arguments for reading/writing multiple array variables, allowing application programmers to explicitly accessmultiple array variables in a single call.

Parallel netCDF: A High-Performance Scientific I/O Interface

This work presents a new parallel interface for writing and reading netCDF datasets that defines semantics for parallel access and is tailored for high performance, and compares the implementation strategies and performance with HDF5.

Data sieving and collective I/O in ROMIO

  • R. ThakurW. GroppE. Lusk
  • Computer Science
    Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation
  • 1999
This work describes how the MPI-IO implementation, ROMIO, delivers high performance in the presence of noncontiguous requests and explains in detail the two key optimizations ROMIO performs: data sieving for non Contiguous requests from one process and collective I/O for noncont contiguous requests from multiple processes.

Scalable I/O forwarding framework for high-performance computing systems

An I/O protocol and API for shipping function calls from compute nodes to I/o nodes are described, and a quantitative analysis of the overhead associated with I-O forwarding is presented.

Tuning HDF5 for Lustre File Systems

It is demonstrated that the combined optimizations improve HDF5 parallel I/O performance by up to 33 times in some cases running close to the achievable peak performance of the underlying file system and scalable performance up to 40,960-way concurrency.