Design, Implementation, and Performance of Checkpointing in NetSolve

  title={Design, Implementation, and Performance of Checkpointing in NetSolve},
  author={Adnan Agbaria and James S. Plank},
While a variety of checkpointing techniques and systems hav e been documented for long-running programs, they are typically not available for programmers tha t are non systems experts. This paper details a project that integrates three technologies, NetSolve, Sta rfish, and IBP, for the seamless integration of faulttolerance into long-running applications. We discuss the d esign and implementation of this project, and present performance results executing on both local and wid e-area networks. 


Publications citing this paper.
Showing 1-10 of 16 extracted citations

Diet: New Developments and Recent Results

Euro-Par Workshops • 2006
View 8 Excerpts
Highly Influenced

Fault Tolerance Management for a Hierarchical GridRPC Middleware

2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID) • 2008
View 1 Excerpt

On the feasibility of incremental checkpointing for scientific computing

18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. • 2004
View 3 Excerpts


Publications referenced by this paper.
Showing 1-10 of 26 references

CoCheck: Checkpointing and Process Migration for MPI

IPPS • 1996
View 4 Excerpts
Highly Influenced

The Condordistributed pr ocessing system

T. Tannenbaumand M. Litzkow
Dr. Dobb’s Journal , • 1995
View 5 Excerpts
Highly Influenced

Program diagnostics

J. S. Plank
In J. G. Webster, edit or, Wiley Encyclopedia of Electrical and Electronics Engineer ing, • 1999
View 1 Excerpt

Similar Papers

Loading similar papers…