• Corpus ID: 10218000

GPFS: A Shared-Disk File System for Large Computing Clusters

@inproceedings{Schmuck2002GPFSAS,
  title={GPFS: A Shared-Disk File System for Large Computing Clusters},
  author={Frank B. Schmuck and Roger L. Haskin},
  booktitle={FAST},
  year={2002}
}
GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community over the last several years, particularly distributed locking and recovery technology. To date it has been a matter of conjecture how well these ideas scale. We have had the opportunity to test those limits… 

Figures from this paper

Evaluating the Shared Root File System Approach for Diskless High-Performance Computing Systems
TLDR
This paper evaluates three networked file system solutions, NFSv4, Lustre and PVFS2, with respect to their performance, scalability, and availability features for servicing a common root file system in a diskless HPC configuration and indicates that Lustre is a viable solution as it meets both, scaling and performance requirements.
Evaluation of the Expand parallel file system on a novel cluster-wide I / O architecture
TLDR
A new architecture that combines the advantages of the parallel file systems with the disadvantages of the traditional I/O architecture of a large cluster is proposed, employing Expand (Expandable Parallel File System) developed in my research group, and the results obtained are compared with the current I/o system of a cluster of HLRS center.
Massive High-Performance Global File Systems for Grid computing
TLDR
The evolution of Global File Systems from the concept of a few years ago to a first demonstration using hardware Fibre Channel frame encoding into IP packets, to a native GFS, to an full prototype demonstration, and finally to a production implementation.
Characterizing HEC Storage Systems at Rest
TLDR
This paper reports on the statistics of supercomputing file systems at rest from a variety of national resource computing sites, contrast these to studies of the 80s and 90s of academic and software development campuses and observes the most interesting characteristics in this novel data.
Characterizing HEC Storage Systems at Rest (CMU-PDL-08-109)
TLDR
This paper reports on the statistics of supercomputing file systems at rest from a variety of national resource computing sites, contrast these to studies of the 80s and 90s of academic and software development campuses and observes the most interesting characteristics in this novel data.
Parallel Storage Systems for Large Scale Machines
TLDR
This study proposes a dynamically coordinated I/O architecture for addressing some of the limitations that current parallel file systems and storage architectures are facing with very large-scale systems.
A New I/O Architecture for Improving the Performance in Large Scale Clusters
TLDR
This paper proposes a new architecture that solves the problem pointed out before: new hierarchical I/O architecture based on parallel I/ O proxies, and shows the design of the proposed solution and a preliminary evaluation, using a cluster located in the Stuttgart HLRS center.
Using On-Demand File Systems in HPC Environments
TLDR
This work presents a simple solution for applications with very high I/O demands to create a private parallel file system on-demand for an HPC job and use the node-local storage devices, e.g. solid-state-disks (SSD).
File Systems and Storage on Making Gpfs Truly General the Basics Where He Serves as a Technical Leader of the Parallel File Systems Group and Principal Architect of Ibm's General Parallel File System
TLDR
Over the years, this original design has proven flexible enough to support numerous other application domains, such as cloud computing, network attached storage (NAS), and analytics, as shown in Figure 1.
Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments
TLDR
This paper deploys PVFS2, GPFS, Lustre, and TerraFS for shared deployment across multiple Linux clusters running with different hardware architectures and operating systems and shows that all of the parallel filesystems outperform a legacy NFS system but with different levels of complexity.
...
...

References

SHOWING 1-10 OF 22 REFERENCES
Implementing Journaling in a Linux Shared Disk File System
TLDR
This paper describes how journaling was implemented in the Global File System (GFS), a shared-disk, cluster file system for Linux, and the evolution of GFS version 3 to version 4, which supports journaling and recovery from client failures.
Scalability in the XFS File System
TLDR
The architecture and design of a new file system, XFS, for Silicon Graphics' IRIX operating system is described, and the use of B+ trees in place of many of the more traditional linear file system structures are discussed.
A 64-bit, shared disk file system for Linux
  • Kenneth W. Preslan, A. Barry, M. O'Keefe
  • Computer Science
    16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098)
  • 1999
TLDR
The goal is to develop a scalable, server-less file system that integrates IF-based network attached storage and Fibre-Channel-based storage area networks (SAN) and exploits the speed and device scalability of SAN clusters, and provides the client scalability and network interoperability of NAS appliances.
Frangipani: a scalable distributed file system
TLDR
Initial measurements indicate that Frangipani has excellent single-server performance and scales well as servers are added, and can be exported to untrusted machines using ordinary network file access protocols.
Petal: distributed virtual disks
TLDR
The design, implementation, and performance of Petal is described, a system that attempts to approximate this ideal in practice through a novel combination of features.
Distributed token management in Calypso file system
TLDR
An efficient protocol for token arbitration, which minimizes bottlenecks and hence enhances scalability, and a practical approach to handling deadlocks, race conditions, and recovery issues, which complicate token manager design and implementation are presented.
Recovery and Coherency-Control Protocols for Fast Intersystem Page Transfer and Fine-Granularity Locking in a Shared Disks Transaction Environment
llbstract This paper proposes schemes for fast page transfer between transaction system Instances In a shared disks (SD) environment where all the sharing Instances can read and modify the same data
Notes on Data Base Operating Systems
  • J. Gray
  • Computer Science
    Advanced Course: Operating Systems
  • 1978
TLDR
This paper is a compendium of data base management operating systems folklore and focuses on particular issues unique to the transaction management component especially locking and recovery.
Extendible hashing—a fast access method for dynamic files
TLDR
This work studies, by analysis and simulation, the performance of extendible hashing and indicates that it provides an attractive alternative to other access methods, such as balanced trees.
IBM builds world's fastest supercomputer to simulate nuclear testing for U.S. Energy Department
  • Poughkeepsie, N.Y
  • 2000
...
...