Application Level Fault Tolerance in Heterogenous Networks of Workstations

@article{Beguelin1997ApplicationLF,
  title={Application Level Fault Tolerance in Heterogenous Networks of Workstations},
  author={Adam Beguelin and Erik Seligman and Peter Stephan},
  journal={J. Parallel Distrib. Comput.},
  year={1997},
  volume={43},
  pages={147-155}
}
We have explored methods for checkpointing and restarting processes within the Distributed object migration environment (Dome), a C++ library of data parallel objects that are automatically distributed over heterogeneous networks of workstations (NOWs). System level checkpointing methods, although transparent to the user, were rejected because they lack… CONTINUE READING