Probabilistic QoS guarantees for supercomputing systems

  title={Probabilistic QoS guarantees for supercomputing systems},
  author={Adam J. Oliner and Larry Rudolph and Ramendra K. Sahoo and Jos{\'e} E. Moreira and Manish Gupta},
  journal={2005 International Conference on Dependable Systems and Networks (DSN'05)},
Supercomputing systems must be able to reliably and efficiently complete their assigned workloads, even in the presence of failures. This paper proposes a system that allows the system and users to negotiate a mutually desirable risk strategy; in order to accomplish this, the system makes probabilistic guarantees on quality of service (QoS), of the form, "Job j can be completed by deadline d with probability p". In order to make such guarantees, the system uses event prediction (forecasting) in… CONTINUE READING
Highly Cited
This paper has 19 citations. REVIEW CITATIONS