Learn More
The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM's BlueGene/L which can accommodate as many as 128 K processors. One of the challenges when designing and deploying these systems in a production setting is the need to take failure occurrences, whether it be(More)
  • 1