Using MPI-2: Advanced Features of the Message Passing Interface

@inproceedings{Gropp2003UsingMA,
  title={Using MPI-2: Advanced Features of the Message Passing Interface},
  author={William Gropp and Ewing L. Lusk and Robert B. Ross and Rajeev Thakur},
  booktitle={IEEE International Conference on Cluster Computing},
  year={2003}
}
From the Publisher: The Message Passing Interface (MPI) specification is widely used for solving significant scientific and engineering problems on parallel computers. There exist more than a dozen implementations on computer platforms ranging from IBM SP-2 supercomputers to clusters of PCs running Windows NT or Linux ("Beowulf" machines). The initial MPI Standard document, MPI-1, was recently updated by the MPI Forum. The new version, MPI-2, contains both significant enhancements to the… 

Infrastructure For Performance Tuning MPI Applications

The goal of this project is to increase the level of performance tool support for message-passing application programmers on clusters of workstations by adding support for LAM/MPI into the existing parallel performance tool, Paradyn.

Performance Tool Support for MPI-2 on Linux

A new performance tool benchmark suite the authors have developed, PPerfMark, is described and results using the enhanced version of the tool to examine the performance of several applications are presented.

Scheduling Dynamically Spawned Processes in MPI-2

This paper presents a scheduler module, that has been implemented with MPI-2, that determines, on-line (i.e. during the execution), on which processor a newly spawned process should be run, and with which priority.

1 Performance Tool Support for MPI-2 on Linux 1

A new performance tool benchmark suite the authors have developed, PPerfMark, is described and results using the enhanced version of the tool to examine the performance of several applications are presented.

Improving the Dynamic Creation of Processes in MPI-2

This paper presents a scheduler module, that has been implemented with MPI-2, that determines, on-line (i.e. during the execution), on which processor a newly spawned process should be run.

On-line Scheduling of MPI-2 Programs with Hierarchical Work Stealing

This work presents an on-line scheduling algorithm, called Hierarchical Work Stealing, to obtain good load-balancing of MPI- 2 programs that follow a Divide & Conquer strategy, and results show that the Hierarchic Work St stealing algorithm enables the use ofMPI with high efficiency, even in parallel dynamic HPC platforms that are not as homogeneous as clusters.

HARNESS fault tolerant MPI design, usage and performance issues

Performance of parallel communication and spawning primitives on a Linux cluster

This paper carefully compares the implementations of communication and spawning primitives in MPICH-2, openMosix, and Linux Remote Procedure Call, forking, and various lower-level communication mechanisms, and reveals poor performance in certain circumstances well below the hardware specification.

A component architecture for the message passing interface (mpi): the systems services interface (ssi) of lam/mpi

This work presents the design and implementation of a component system architecture in LAM/MPI, a production quality, open source implementation of the MPI-1 andMPI-2 standards, that is highly modular, has published abstraction and interface boundaries, and is significantly easier to develop, maintain, and use as a vehicle for research.

An Introduction to Parallel Programming Using MPI

This chapter serves as an introduction to the Message Passing Interface (MPI), which is a widely used library of code for writing parallel programs on distributed memory architectures.
...

References

SHOWING 1-10 OF 63 REFERENCES

A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard

The emergence of the MPI message passing standard for parallel computing

A Case for Using MPI's Derived Datatypes to Improve I/O Performance

This work explains how critical this feature is for high performance, why users must create and use derived datatypes whenever possible, and how it enables implementations to perform optimizations.

On implementing MPI-IO portably and with high performance

This work develops a high-performance, portable MPI-IO implementation, called ROMIO, that combines a large portion of portable code and a small portion of code that is optimized separately for different machines and file systems.

Dynamic process management in an MPI setting

  • W. GroppE. Lusk
  • Computer Science
    Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing
  • 1995
Extensions to the Message-Passing Interface (MPI) Standard that provide for dynamic process management, including spawning of new processes by a running application and connection to existing processes to support client/server applications are proposed.

Object Oriented MPI (OOMPI): a class library for the Message Passing Interface

This paper presents the requirements, analysis and design for Object-Oriented MPI (OOMPI), a C++ class library for MPI, a generic object-oriented class library specification that uses C++ as the program description language.

Performance and experience with LAPI-a new high-performance communication library for the IBM RS/6000 SP

  • Gautam ShahJ. Nieplocha C. A. Bender
  • Computer Science
    Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing
  • 1998
An overview of LAPI characteristics and its differences from other models such as MPI-2 are provided and some base performance parameters of L API are presented including latency and bandwidth and compare it with performance of the MPI/MPL.

PMPIO-a portable implementation of MPI-IO

Preliminary results using the PMPIO implementation of MPI-IO show an improvement of as much as a factor of 20 on the NAS BTIO benchmark compared to a Fortran based implementation.

Global Arrays: a portable "shared-memory" programming model for distributed memory computers

The key concept of GA is that it provides a portable interface through which each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed matrices, with no need for explicit cooperation by other processes.
...