Adaptive two-level blocking coordinated checkpointing based on recovery cost

@article{Lotfi2009AdaptiveTB,
  title={Adaptive two-level blocking coordinated checkpointing based on recovery cost},
  author={Mehdi Lotfi and Seyed Ahmad Motamedi and Mojtaba Bandarabadi},
  journal={2009 41st Southeastern Symposium on System Theory},
  year={2009},
  pages={113-117}
}
In this paper we introduce a new adaptive two-level blocking coordinated checkpointing for cluster computing systems. First level of checkpointing is local checkpointing and computing nodes save the checkpoints in local disk based on transient failure rates. If a transient failure occurs in the computing node, process can recover from local disk. Second level of checkpointing is global checkpointing and computing nodes send their checkpoints to high reliable global stable storage in network… CONTINUE READING