Error Recovery in Multicomputers Using Global Checkpoints

  title={Error Recovery in Multicomputers Using Global Checkpoints},
  author={Yuval Tamir and C. H. Sequin},
Periodic checkpointing of the entire system state and rolling back to the last checkpoint when an error is detected is proposed as a basis for error recovery on a VLSI multicomputer executing non-interactive applications. Detailed algorithms for saving the checkpoints, distributing diagnostic information, and restoring a valid system state are presented. This approach places no restrictions on the actions of the application tasks, and, during normal computation, does not require the complex… CONTINUE READING
Highly Cited
This paper has 119 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.


Publications citing this paper.
Showing 1-10 of 77 extracted citations

Cyber-Physical System Checkpointing and Recovery

2018 ACM/IEEE 9th International Conference on Cyber-Physical Systems (ICCPS) • 2018
View 1 Excerpt

Fault- Tolerance Support for Mobile Robotic Applications

2018 IEEE 13th International Symposium on Industrial Embedded Systems (SIES) • 2018

ASC: Improving spark driver performance with automatic spark checkpoint

2016 18th International Conference on Advanced Communication Technology (ICACT) • 2016
View 1 Excerpt

120 Citations

Citations per Year
Semantic Scholar estimates that this publication has 120 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 24 references

A Message System Supporting Fault Tolerance

View 10 Excerpts
Highly Influenced

Application-Transparent Setting of Recovery Points,’

G. Barigazzi, L. Strigini
13th Fault-Tolerant Computing Symposium, • 1983
View 7 Excerpts
Highly Influenced

Transputer Does 5 or More MIPS Even When Not Used in Parallel,’

I. Barron, P. Cavill, D. May, P. Wilson
View 8 Excerpts
Highly Influenced

Fault Tolerance Terminology Proposals,’

T. Anderson, P. A. Lee
12th Fault-Tolerant Computing Symposium, Santa Monica, • 1982
View 4 Excerpts
Highly Influenced

Design of dynamically checked computers

IFIP Congress • 1968
View 4 Excerpts
Highly Influenced

Reliability and Availability Techniques,’

S. A. Elkind
The Theory and Practice of Reliable System Design, • 1982
View 3 Excerpts
Highly Influenced

Similar Papers

Loading similar papers…