Fault Tolerant Scheduling in Distributed Networks

  title={Fault Tolerant Scheduling in Distributed Networks},
  author={Jon B. Weissman},
We present a model for application-level fault tolerance for parallel applications. The objective is to achieve high reliability with minimal impact on the application. Our approach is based on a full replication of all parallel application components in a distributed wide-area environment in which each replica is independently scheduled in a different site… CONTINUE READING