Increasing Reliability through Dynamic Virtual Clustering

@inproceedings{Emeneker2006IncreasingRT,
  title={Increasing Reliability through Dynamic Virtual Clustering},
  author={Wesley Emeneker and D. Stanzione and Ira A. Fulton},
  year={2006}
}
In a scientific community that increasingly relies upon High Performance Computing (HPC) for large scale simulations and analysis, the reliability of hardware and applications devoted to HPC is extremely important. While hardware reliability is not likely to dramatically increase in the coming years, software must be able to provide the reliability required by demanding applications. One way to increase the reliability of HPC systems is to use checkpointing to save the state of an application… CONTINUE READING

References

Publications referenced by this paper.
Showing 1-9 of 9 references

The Design and Implementation of Berkeley Lab’s

J. Duell, P. Hargrove, E Roman
Linux Checkpoint/Restart, • 2003
View 5 Excerpts
Highly Influenced

Zandy and Barton P . Miller . Reliable network connections

C. Victor
2002

Similar Papers

Loading similar papers…