## Parallel reduction to Hessenberg form with Algorithm-Based Fault Tolerance

- Yulu Jia, George Bosilca, Piotr Luszczek, Jack J. Dongarra
- 2013 SC - International Conference for Highâ€¦
- 2013

@article{Hakkarinen2010AlgorithmicCF, title={Algorithmic Cholesky factorization fault recovery}, author={Douglas Hakkarinen and Zizhong Chen}, journal={2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)}, year={2010}, pages={1-10} }

Published 2010 in 2010 IEEE International Symposium on Parallel & Distributed Processing
DOI:10.1109/IPDPS.2010.5470436

Modeling and analysis of large scale scientific systems often use linear least squares regression, frequently employing Cholesky factorization to solve the resulting set of linear equations. With large matrices, this often will be performed in high performance clusters containing many processors. Assuming a constant failure rate per processor, the probability of a failure occurring during the execution increases linearly with additional processors. Fault tolerant methods attempt to reduce theâ€¦Â CONTINUE READING

