Tests and Tolerances for High-Performance Software-Implemented Fault Detection

@article{Turmon2003TestsAT,
  title={Tests and Tolerances for High-Performance Software-Implemented Fault Detection},
  author={Michael J. Turmon and Robert A. Granat and Daniel S. Katz and John Z. Lou},
  journal={IEEE Trans. Computers},
  year={2003},
  volume={52},
  pages={579-591}
}
We describe and test a software approach to fault detection in common numerical algorithms. Such result checking or algorithm-based fault tolerance (ABFT) methods may be used, for example, to overcome single-event upsets in computational hardware or to detect errors in complex, high-efficiency implementations of the algorithms. Following earlier work, we use checksum methods to validate results returned by a numerical subroutine operating subject to unpredictable errors in data. We consider… CONTINUE READING
Highly Cited
This paper has 32 citations. REVIEW CITATIONS

Citations

Publications citing this paper.
Showing 1-10 of 23 extracted citations

References

Publications referenced by this paper.
Showing 1-10 of 37 references

ScaLAPACK, Users

  • L. S. Blackford
  • 1997
Highly Influential
1 Excerpt

J

  • F. Chen, L. Craymer
  • Deifik, A.J. Fogel, D.S. Katz, A.G. Silliman Jr…
  • 2000
1 Excerpt

Single-Event Upset and Snapback in Silicon-on- Insulator Devices and Integrated Circuits,

  • P. E. Dodd
  • IEEE Trans. Nuclear Science,
  • 2000
1 Excerpt

NGST: Seeing the First Stars and Galaxies Form,

  • H. S. Stockman, J. Mather
  • Galaxy Interactions at Low and High Redshift…
  • 1999
1 Excerpt

Similar Papers

Loading similar papers…