Distributed mpi cross-site run performance using mpig

Abstract

Large scale supercomputing applications typically run on clusters using vendor message passing libraries, limiting the application to the availability of memory and CPU resources on that single machine. The ability to run inter-cluster parallel code is attractive since it allows the consolidation of multiple large scale resources for computational simulations not possible on a single machine, and it also allows the conglomeration of small subsets of CPU cores for rapid turnaround, for example, in the case of high-availability computing. MPIg is a grid-enabled implementation of the Message Passing Interface (MPI), extending the MPICH implementation of MPI to use Globus Toolkit services such as resource allocation and authentication. To achieve co-availability of resources, HARC, the Highly-Available Resource Co-allocator, is used. Here we examine two applications using MPIg: LAMMPS (Large-scale Atomic/ Molecular Massively Parallel Simulator), is used with a replica exchange molecular dynamics approach to enhance binding affinity calculations in HIV drug research, and HemeLB, which is a lattice-Boltzmann solver designed to address fluid flow in geometries such as the human cerebral vascular system. The cross-site scalability of both these applications is tested and compared to single-machine performance. In HemeLB, communication costs are hidden by effectively overlapping non-blocking communication with computation, essentially scaling linearly across multiple sites, and LAMMPS scales almost as well when run between two significantly geographically separated sites as it does at a single site.

DOI: 10.1145/1383422.1383459

Extracted Key Phrases

1 Figure or Table

Cite this paper

@inproceedings{Manos2008DistributedMC, title={Distributed mpi cross-site run performance using mpig}, author={Steven Manos and Marco D. Mazzeo and Owain Kenway and Peter V. Coveney and Nicholas T. Karonis and Brian R. Toonen}, booktitle={HPDC}, year={2008} }