Faults in Large Distributed Systems and What We Can Do About Them

@inproceedings{Kola2005FaultsIL,
  title={Faults in Large Distributed Systems and What We Can Do About Them},
  author={George Kola and Tevfik Kosar and Miron Livny},
  booktitle={Euro-Par},
  year={2005}
}
Scientists are increasingly using large distributed systems built from commodity off-the-shelf components to perform scientific computation. Grid computing has expanded the scale of such systems by spanning them across organizations. While such systems are cost-effective, the usage of large number of commodity components causes high fault and failure rates. Some of these faults result in silent data corruption leaving users with possibly incorrect results. In this work, we analyzed the faults… CONTINUE READING
BETA

Citations

Publications citing this paper.
SHOWING 1-10 OF 24 CITATIONS

References

Publications referenced by this paper.
SHOWING 1-10 OF 13 REFERENCES

Best practices guide: Addressing e-cache parity errors

  • Sun Microsystems Inc
  • http://www.filibeto.org/sun/lib/hardware…
  • 2001
1 Excerpt

Sun suffers UltraSparc ii cache crash headache

  • The Register
  • http://www.theregister.co.uk//03/07/sun…
  • 2001
1 Excerpt

Similar Papers

Loading similar papers…