Distributed Fault-Tolerance For Large Multiprocessor Systems

  title={Distributed Fault-Tolerance For Large Multiprocessor Systems},
  author={Jon G. Kuhl and Sudhakar M. Reddy},
Techniques for dealing with hardware failures in very large networks of distributed processing elements are presented. A concept known as distributed fault-tolerance is introduced. A model of a large multiprocessor system is developed and techniques, based on this model, are given by which each processing element can correctly diagnose failures in all other processing elements in the system. The effect of varying system interconnection structures upon the extent and efficiency of the diagnosis… CONTINUE READING

From This Paper

Figures, tables, and topics from this paper.
69 Citations
0 References
Similar Papers