Lazy Logging and Prefetch-Based Crash Recovery in Software Distributed Shared Memory Systems

Abstract

In this paper, we propose a new, efficient logging protocol, called lazy logging, and a fast crash recovery protocol, called the prefetch-based crash recovery (PCR), for software distributed shared memory (SDSM). Our lazy logging protocol minimizes failure-free overhead by logging only data indispensable for correct recovery, while our PCR protocol reduces the recovery time by prefetching data according to the future memory access patterns, thus eliminating memory miss penalty during the recovery process. We have performed experiments on workstation clusters, comparing our protocols against the earlier reduced-stable logging (RSL) protocol by actually implementing both protocols in TreadMarks, a state-of-the-art SDSM system. The experimental results show that our lazy logging protocol consistently outperforms the RSL protocol. Our protocol increases the execution time slightly by 1% to 4% during failure-free execution, while the RSL protocol results in the execution time overhead of 6% to 21% due to its larger log size and higher disk access frequency. Our PCR protocol also outperforms the widely used simple crash recovery protocol by 18% to 57% under all applications examined.

DOI: 10.1109/IPPS.1999.760507

Extracted Key Phrases

8 Figures and Tables

Cite this paper

@inproceedings{Kongmunvattana1999LazyLA, title={Lazy Logging and Prefetch-Based Crash Recovery in Software Distributed Shared Memory Systems}, author={Angkul Kongmunvattana and Nian-Feng Tzeng}, booktitle={IPPS/SPDP}, year={1999} }