Corpus ID: 7602007

Increasing Reliability through Dynamic Virtual Clustering

@inproceedings{Emeneker2006IncreasingRT,
  title={Increasing Reliability through Dynamic Virtual Clustering},
  author={Wesley Emeneker and D. Stanzione and I. Fulton},
  year={2006}
}
In a scientific community that increasingly relies upon High Performance Computing (HPC) for large scale simulations and analysis, the reliability of hardware and applications devoted to HPC is extremely important. While hardware reliability is not likely to dramatically increase in the coming years, software must be able to provide the reliability required by demanding applications. One way to increase the reliability of HPC systems is to use checkpointing to save the state of an application… CONTINUE READING
10 Citations

Figures from this paper.

Cluster-wide context switch of virtualized jobs
  • 31
  • PDF
Combining batch execution and leasing using virtual machines
  • 259
  • PDF
Saline: Improving Best-Effort Job Management in Grids
  • 4
  • PDF
Dynamic Fractional Resource Scheduling vs. Batch Scheduling
  • 3
  • PDF
Dynamic fractional resource scheduling for HPC workloads
  • 29
  • PDF
Virtual Organization Clusters: Self-provisioned clouds on the grid
  • 39
  • PDF
Changement de contexte pour tâches virtualisées à l'échelle des grappes
  • 1

References

SHOWING 1-9 OF 9 REFERENCES
Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand
  • 74
  • PDF
The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing
  • 300
  • PDF
Reliable network connections
  • 172
  • PDF
Libckpt: Transparent Checkpointing under UNIX
  • 691
  • PDF
Improved algorithms for synchronizing computer network clocks
  • D. Mills
  • Computer Science
  • SIGCOMM 1994
  • 1994
  • 160
  • PDF
A Survey of Checkpoint / Restart Implementations
  • 78
  • Highly Influential
  • PDF
Requirements for Linux Checkpoint/Restart
  • 68
  • PDF
The Design and Implementation of Berkeley Lab’s
  • Linux Checkpoint/Restart,
  • 2003
Zandy and Barton P . Miller . Reliable network connections
  • 2002