Accurate application progress analysis for large-scale parallel debugging

  title={Accurate application progress analysis for large-scale parallel debugging},
  author={Subrata Mitra and Ignacio Laguna and Dong H. Ahn and Saurabh Bagchi and Martin Schulz and Todd Gamblin},
Debugging large-scale parallel applications is challenging. In most HPC applications, parallel tasks progress in a coordinated fashion, and thus a fault in one task can quickly propagate to other tasks, making it difficult to debug. Finding the least-progressed tasks can significantly reduce the effort to identify the task where the fault originated. However, existing approaches for detecting them suffer low accuracy and large overheads; either they use imprecise static analysis or are unable… CONTINUE READING
Highly Cited
This paper has 25 citations. REVIEW CITATIONS

From This Paper

Figures, tables, results, and topics from this paper.

Key Quantitative Results

  • Our fault-injection experiments suggest that its accuracy and precision are over 90% for most cases and that it scales well up to 16,384 MPI tasks.


Publications citing this paper.
Showing 1-10 of 15 extracted citations


Publications referenced by this paper.

Similar Papers

Loading similar papers…