A Survey of Checkpoint / Restart Implementations

  title={A Survey of Checkpoint / Restart Implementations},
  author={Eric Roman},
In this paper we evaluate candidates for a checkpoint/restart implementation against a common set of requirements. Overall characteristics of the two main classes of checkpoint systems, library and system, are discussed followed by specific examples from existing systems. A detailed description of two system implementations is presented. We conclude that no single publically available implementation meets all requirements for a checkpoint/restart system for Linux clusters. 
Highly Cited
This paper has 71 citations. REVIEW CITATIONS
51 Citations
8 References
Similar Papers


Publications citing this paper.
Showing 1-10 of 51 extracted citations

71 Citations

Citations per Year
Semantic Scholar estimates that this publication has 71 citations based on the available data.

See our FAQ for additional information.

Similar Papers

Loading similar papers…