Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications

@article{Guermouche2011UncoordinatedCW,
  title={Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic MPI Applications},
  author={Amina Guermouche and Thomas Ropars and Elisabeth Brunet and Marc Snir and Franck Cappello},
  journal={2011 IEEE International Parallel & Distributed Processing Symposium},
  year={2011},
  pages={989-1000}
}
As reported by many recent studies, the mean time between failures of future post-petascale supercomputers is likely to reduce, compared to the current situation. The most popular fault tolerance approach for MPI applications on HPC Platforms relies on coordinated check pointing which raises two major issues: a) global restart wastes energy since all processes are forced to rollback even in the case of a single failure, b) checkpoint coordination may slow down the application execution because… CONTINUE READING
Highly Cited
This paper has 96 citations. REVIEW CITATIONS

Citations

Publications citing this paper.
Showing 1-10 of 64 extracted citations

96 Citations

01020'12'14'16'18
Citations per Year
Semantic Scholar estimates that this publication has 96 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.

Similar Papers

Loading similar papers…