Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor

  title={Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor},
  author={Prithviraj Banerjee and Joseph T. Rahmeh and Craig B. Stunkel and Suku Nair and Kaushik Roy and Vijay Balasubramanian and Jacob A. Abraham},
  journal={IEEE Trans. Computers},
Abstmct Hypercube multiprocessors have recently offered a cost effective and feasible approach to supercomputing through parallelism at the processor level by directly connecting a large number of low-cost processors with local memories which communicate by message-passing instead of shared variables. This paper discusses the design of a fault-tolerant hypercube multiprocessor architecture. Most of the recently proposed schemes of fault tolerance in parallel architectures address mainly the… CONTINUE READING
Highly Cited
This paper has 122 citations. REVIEW CITATIONS


Publications citing this paper.
Showing 1-10 of 73 extracted citations

Robust assertions and fail-bounded behavior

Journal of the Brazilian Computer Society • 2005
View 4 Excerpts
Highly Influenced

Online Algorithm-Based Fault Tolerance for Cholesky Decomposition on Heterogeneous Systems with GPUs

2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) • 2016
View 1 Excerpt

Algorithm Level Fault Tolerance for Molecular Dynamic Applications

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) • 2015
View 1 Excerpt

122 Citations

Citations per Year
Semantic Scholar estimates that this publication has 122 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 18 references

Fault - tolerant systems for the computation of eigenvalues and singular values

J. A. Abraham
Proc . SPZE , Advanced Algorithms Architectures Signal Processing • 1990

Nair, for a photograph and biography, see the April

V S.

S’7l-M’74-SMW-F’85), for a photograph and biography, see the April 1990 issue of this TRANSACTIONS

p. 446. Jacob A. Abraham

A concurrent error detecting conjugate gradient algorithm on a hypercube multiprocessor,

C. Aykanat, F. Ozguner
Proc. 17th Znt. Symp. Fault-Tolerant Comput., • 1987

Fault diagnosis in fully distributed systems

S. M. Reddy
Proc . 16 th Znt . Symp . Fault - Tolerant Comput • 1986

Similar Papers

Loading similar papers…