Brian R. Toonen

Learn More
Application development for distributed-computing ‘‘Grids’’ can benefit from tools that variously hide or enable application-level management of critical aspects of the heterogeneous environment. As part of an investigation of these issues, we have developed MPICH-G2, a Grid-enabled implementation of the Message Passing Interface (MPI) that allows a user to(More)
Improvements in the performance of processors and networks make it both feasible and interesting to treat collections of workstations, servers, clusters, and supercomputers as integrated computational resources, or Grids. However, the highly heterogeneous and dynamic nature of such Grids can make application development difficult. Here we describe an(More)
Summary form only given. For several years, MPI has been the de facto standard for writing parallel applications. One of the most popular MPI implementations is MPICH. Its successor, MPICH2, features a completely new design that provides more performance and flexibility. To ensure portability, it has a hierarchical structure based on which porting can be(More)
We describe the design of a parallel global atmospheric circulation model, PCCM2. This parallel model is functionally equivalent to the National Center for Atmospheric Research's Community Climate Model, CCM2, but is struc-tured to exploit distributed memory multicomputers. PCCM2 incorporates parallel spectral transform, semi-Lagrangian transport, and load(More)
Implementations of climate models on scalable parallel computer systems can suuer from load imbalances because of temporal and spatial variations in the amount of computation required for physical pa-rameterizations such as solar radiation and convec-tive adjustment. We have developed specialized techniques for correcting such imbalances. These techniques(More)
During the past few years, two main approaches have been taken to improve the performance of software shared memory implementations: relaxing consistency models and providing fine-grained access control. Their performance tradeoffs, however, we not well understood. This paper studies these tradeoffs on a platform that provides access control in hardware but(More)
Parallel programmers typically assume that all resources required for a program's execution are dedicated to that purpose. However, in local and wide area networks, contention for shared networks, CPUs, and I/O systems can result in significant variations in availability, with consequent adverse effects on overall performance. We describe a new(More)
The BlueGene/L supercomputer will consist of 65,536 dual-processor compute nodes interconnected by two high-speed networks: a three-dimensional torus network and a tree topology network. Each compute node can only address its own local memory, making message passing the natural programming model for BlueGene/L. In this paper we present our implementation of(More)
of message-passing services for the Blue Gene/L supercomputer G. Almási C. Archer J. G. Castaños J. A. Gunnels C. C. Erway P. Heidelberger X. Martorell J. E. Moreira K. Pinnow J. Ratterman B. D. Steinmacher-Burow W. Gropp B. Toonen The Blue Genet/L (BG/L) supercomputer, with 65,536 dualprocessor compute nodes, was designed from the ground up to support(More)
Large scale supercomputing applications typically run on clusters using vendor message passing libraries, limiting the application to the availability of memory and CPU resources on that single machine. The ability to run inter-cluster parallel code is attractive since it allows the consolidation of multiple large scale resources for computational(More)