Learn More
A large number of MPI implementations are currently available , each of which emphasize different aspects of high-performance computing or are intended to solve a specific research problem. The result is a myriad of incompatible MPI implementations, all of which require separate installation, and the combination of which present significant logistical(More)
As high-performance clusters continue to grow in size and popularity, issues of fault tolerance and reliability are becoming limiting factors on application scalability. To address these issues, we present the design and implementation of a system for providing coordinated checkpointing and rollback recovery for MPI-based parallel applications. Our approach(More)
TEG is a new methodology for point-to-point messaging developed as a part of the Open MPI project. Initial performance measurements are presented , showing comparable ping-pong latencies in a single NIC configuration, but with bandwidths up to 30% higher than that achieved by other leading MPI implementations. Homogeneous dual-NIC configurations further(More)
TEG is a new component-based methodology for point-to-point mes-saging. Developed as part of the Open MPI project, TEG provides a configurable fault-tolerant capability for high-performance messaging that utilizes multi-network interfaces where available. Initial performance comparisons with other MPI implementations show comparable ping-pong latencies, but(More)
  • 1