Fault-tolerance, malleability and migration for divide-and-conquer applications on the grid

  title={Fault-tolerance, malleability and migration for divide-and-conquer applications on the grid},
  author={Gosia Wrzesinska and Rob van Nieuwpoort and Jason Maassen and Henri E. Bal},
  journal={19th IEEE International Parallel and Distributed Processing Symposium},
  pages={10 pp.-}
Grid applications have to cope with dynamically changing computing resources as machines may crash or be claimed by other, higher-priority applications. In this paper, we propose a mechanism that enables fault-tolerance, malleability (e.g. the ability to cope with a dynamically changing number of processors) and migration for divide-and-conquer applications on the grid. The novelty of our approach is restructuring the computation tree, which eliminates redundant computation and salvages partial… CONTINUE READING
Highly Cited
This paper has 54 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.


Publications citing this paper.
Showing 1-10 of 34 extracted citations

Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing

IEEE Transactions on Dependable and Secure Computing • 2009
View 19 Excerpts
Highly Influenced

A new fault-tolerance framework for grid computing

Multiagent and Grid Systems • 2006
View 5 Excerpts
Highly Influenced

Fault Tolerance Schemes for Global Load Balancing in X10

Scalable Computing: Practice and Experience • 2015
View 3 Excerpts

Fault-Tolerant Global Load Balancing in X10

2014 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing • 2014
View 3 Excerpts

55 Citations

Citations per Year
Semantic Scholar estimates that this publication has 55 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 24 references

GGF Grid Checkpoint Recovery Working Group Charter

D. Simmel, T. Kielmann, N. Stone
Global Grid Forum, January • 2004
View 1 Excerpt

A Malleable-Job System for Timeshared Parallel Machines

2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02) • 2002
View 1 Excerpt

A Grid programming primer

C. Lee, S. Matsuoka, +4 authors J. Saltz
Global Grid Forum, August • 2001
View 1 Excerpt

Similar Papers

Loading similar papers…