Fault Tolerant Scheduling in Distributed Networks

@inproceedings{Weissman1996FaultTS,
  title={Fault Tolerant Scheduling in Distributed Networks},
  author={Jon B. Weissman},
  year={1996}
}
We present a model for application-level fault tolerance for parallel applications. The objective is to achieve high reliability with minimal impact on the application. Our approach is based on a full replication of all parallel application components in a distributed wide-area environment in which each replica is independently scheduled in a different site… CONTINUE READING