Impact of event logger on causal message logging protocols for fault tolerant MPI

Abstract

Fault tolerance in MPI becomes a main issue in the HPC community. Several approaches are envisioned from user or programmer controlled fault tolerance to fully automatic fault detection and handling. For this last approach, several protocols have been proposed in the literature. In a recent paper, we have demonstrated that uncoordinated checkpointing… (More)
DOI: 10.1109/IPDPS.2005.249

10 Figures and Tables

Topics

  • Presentations referencing similar topics