Proactive fault tolerance for HPC with Xen virtualization

@inproceedings{Nagarajan2007ProactiveFT,
  title={Proactive fault tolerance for HPC with Xen virtualization},
  author={Arun Babu Nagarajan and Frank Mueller and Christian Engelmann and Stephen L. Scott},
  booktitle={ICS},
  year={2007}
}
Large-scale parallel computing is relying increasingly on clusters with thousands of processors. At such large counts of compute nodes, faults are becoming common place. Current techniques to tolerate faults focus on reactive schemes to recover from faults and generally rely on a checkpoint/restart mechanism. Yet, in today's systems, node failures can often be anticipated by detecting a deteriorating health status. Instead of a reactive scheme for fault tolerance (FT), we are promoting a… CONTINUE READING
Highly Influential
This paper has highly influenced 17 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 403 citations. REVIEW CITATIONS

Citations

Publications citing this paper.
Showing 1-10 of 248 extracted citations

404 Citations

0204060'09'12'15'18
Citations per Year
Semantic Scholar estimates that this publication has 404 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.
Showing 1-4 of 4 references

Similar Papers

Loading similar papers…