Modular Checkpointing for Atomicity

Abstract

Transient faults that arise in large-scale software systems can often be repaired by re-executing the code in which they occur. Ascribing a meaningful semantics for safe re-execution in multi-threaded code is not obvious, however. For a thread to correctly re-execute a region of code, it must ensure that all other threads which have witnessed its unwanted… (More)
DOI: 10.1016/j.entcs.2007.04.008

Topics

12 Figures and Tables

Slides referencing similar topics