Application checkpointing

Known as: CryoPID, Checkpoint, System checkpointing 
Checkpointing is a technique to add fault tolerance into computing systems. It basically consists of saving a snapshot of the application's state, so… (More)
Wikipedia

Papers overview

Semantic Scholar uses AI to extract papers important to this topic.
Highly Cited
2005
Highly Cited
2005
As high-performance clusters continue to grow in size and popularity, issues of fault tolerance and reliability are becoming… (More)
  • figure 1
  • figure 2
  • figure 3
  • figure 5
  • figure 4
Is this relevant?
Highly Cited
2004
Highly Cited
2004
Trends in high-performance computing are making it necessary for long-running applications to tolerate hardware faults. The most… (More)
  • figure 1
  • figure 2
  • figure 5
  • figure 6
  • figure 8
Is this relevant?
Highly Cited
2003
Highly Cited
2003
The running times of many computational science applications, such as protein-folding using ab initio methods, are much longer… (More)
  • figure 1
  • figure 2
  • figure 3
  • figure 5
  • figure 8
Is this relevant?
Highly Cited
1998
Highly Cited
1998
Diskless Checkpointing is a technique for checkpointing the state of a long-running computation on a distributed system without… (More)
  • figure 1
  • table 1
  • table 2
  • figure 2
  • figure 3
Is this relevant?
Highly Cited
1996
Highly Cited
1996
Checkpointing of parallel applications can be used as the core technology to provide process migration. Both, checkpointing and… (More)
  • figure 1
  • figure 2
  • figure 3
  • figure 4
  • figure 5
Is this relevant?
Highly Cited
1995
Highly Cited
1995
Checkpointing is a simple technique for rollback recovery: the state of an executing program is periodically saved to a disk le… (More)
Is this relevant?
Highly Cited
1995
Highly Cited
1995
This paper describes our experience with the implementation and applications of the Unix checkpointing library libckp, and… (More)
  • figure 1
  • table 1
  • figure 2
  • figure 3
  • figure 5
Is this relevant?
Highly Cited
1994
Highly Cited
1994
The integration of mobildportable computing devices within existing data networks can be expected to spawn distributed… (More)
  • figure I
  • figure 2
Is this relevant?
Highly Cited
1992
Highly Cited
1992
Consistent checkpointing provides transparent fault tol erance for long running distributed applications In this paper we… (More)
Is this relevant?
Highly Cited
1986
Highly Cited
1986
We consider the problem of bringing a distributed system to a consistent state after transient failures. We address the two… (More)
  • figure 1
  • figure 2
  • figure 6
  • figure 4
  • figure 7
Is this relevant?