Checkpointing for peta-scale systems: a look into the future of practical rollback-recovery

  title={Checkpointing for peta-scale systems: a look into the future of practical rollback-recovery},
  author={E. N. Elnozahy and James S. Plank},
  journal={IEEE Transactions on Dependable and Secure Computing},
Over the past two decades, rollback-recovery via checkpoint-restart has been used with reasonable success for long-running applications, such as scientific workloads that take from few hours to few months to complete. Currently, several commercial systems and publicly available libraries exist to support various flavors of checkpointing. Programmers typically use these systems if they are satisfactory or otherwise embed checkpointing support themselves within the application. In this paper, we… CONTINUE READING
Highly Cited
This paper has 207 citations. REVIEW CITATIONS
141 Extracted Citations
26 Extracted References
Similar Papers

Citing Papers

Publications influenced by this paper.
Showing 1-10 of 141 extracted citations

207 Citations

Citations per Year
Semantic Scholar estimates that this publication has 207 citations based on the available data.

See our FAQ for additional information.

Referenced Papers

Publications referenced by this paper.

Similar Papers

Loading similar papers…