• Publications
  • Influence
Extending the POSIX I/O interface: a parallel file system perspective.
TLDR
The rationale, design, and evaluation of a reference implementation of a subset of the POSIX I/O interfaces on a widely used parallel file system (PVFS) on clusters confirm that the extensions to thePOSIX interface greatly improve scalability and performance. Expand
Red storm IO performance analysis
TLDR
This paper will summarize an IO performance analysis effort performed on Sandia National Laboratories Red Storm platform to identify problems or bottle-necks in any aspect of the IO sub-system. Expand
Shared Libraries on a Capability Class Computer
TLDR
This paper will first provide some background on the Linux implementation of shared libraries, which was not designed for distributed HPC platforms, and this introductory information will be used to identify the scalability issues for massively parallel systems such as the Cray XT/XE product lines. Expand
Report of experiments and evidence for ASC L2 milestone 4467 : demonstration of a legacy application's path to exascale.
TLDR
The purpose of this report is to prove that the team has completed milestone 4467-Demonstration of a Legacy Application's Path to Exascale, and to determine where the breaking point is for an existing highly scalable application. Expand
Detailed analysis of I/O traces for large scale applications
TLDR
The analysis showed that the I/O traces reveal much information about the application even without access to the source code, and provide multiple indications towards the algorithmic nature of the application by observing the changes of data amount and I/o request distribution at the checkpoints. Expand
A Case for Optimistic Coordination in HPC Storage Systems
TLDR
The prototype illustrates that conditional operations can be easily integrated into distributed object storage systems and can outperform standard coordination primitives for simple update workloads and shows that conditional updates can achieve over two orders of magnitude higher performance than pessimistic locking for some parallel read/modify/write workloads. Expand
An extensible, portable, scalable cluster management software architecture
TLDR
An object-oriented software architecture for cluster integration and management that enables extensibility, portability, and scalability is described that has been successfully implemented and deployed on several large-scale production clusters at Sandia National Laboratories. Expand
Using the Sirocco File System for High-Bandwidth Checkpoints
TLDR
This demonstration of early Sirocco functionality shows a significant benefit for a real I/O workload, checkpointing, in a real application, CTH, which was able to store checkpoints 10-60x faster than storing to PanFS, allowing the job to continue computing sooner. Expand
...
1
2
...