High performance linpack benchmark: a fault tolerant implementation without checkpointing

@inproceedings{Davies2011HighPL,
  title={High performance linpack benchmark: a fault tolerant implementation without checkpointing},
  author={Teresa Davies and Christer Karlsson and Hui Liu and Chong Ding and Zizhong Chen},
  booktitle={ICS},
  year={2011}
}
The probability that a failure will occur before the end of the computation increases as the number of processors used in a high performance computing application increases. For long running applications using a large number of processors, it is essential that fault tolerance be used to prevent a total loss of all finished computations after a failure. While checkpointing has been very useful to tolerate failures for a long time, it often introduces a considerable overhead especially when… CONTINUE READING
Highly Cited
This paper has 100 citations. REVIEW CITATIONS
63 Citations
2 References
Similar Papers

Citations

Publications citing this paper.
Showing 1-10 of 63 extracted citations

100 Citations

02040'12'14'16'18
Citations per Year
Semantic Scholar estimates that this publication has 100 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.
Showing 1-2 of 2 references

HPL - a portable implementation of the high-performance linpack benchmark for distributed-memory computers

  • A. Petitet, R. C. Whaley, J. Dongarra, A. Cleary
  • 2008
Highly Influential
12 Excerpts

Fault Tolerant Matrix Operations for Parallel and Distributed Systems

  • Y. Kim
  • PhD thesis,
  • 1996
Highly Influential
5 Excerpts

Similar Papers

Loading similar papers…