Re-execution of distributed programs to detect bugs hidden by racing messages

@article{Kilgore1997ReexecutionOD,
  title={Re-execution of distributed programs to detect bugs hidden by racing messages},
  author={Richard B. Kilgore and Craig M. Chase},
  journal={Proceedings of the Thirtieth Hawaii International Conference on System Sciences},
  year={1997},
  volume={1},
  pages={423-432 vol.1}
}
Finding errors in non-deterministic programs is complicated by the fact that an anomaly may occur during one program execution and not the next. Our objective is to provide a practical, yet powerful testing environment for distributed systems, using re-execution. We focus on re-executing the program under a strictly different message ordering. We show that messages are grouped into waves, such that any two messages from different waves must always be received in the same order. We provide an… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 13 CITATIONS

Dynamic testing of flow graph based parallel applications

VIEW 3 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

Decomposing Partial Order Execution Graphs to Improve Message Race Detection

VIEW 3 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

Visualization of Message Races in MPI Parallel Programs

VIEW 2 EXCERPTS
CITES BACKGROUND

Testing concurrent software systems

VIEW 1 EXCERPT
CITES BACKGROUND

Preparing for replay

VIEW 2 EXCERPTS
CITES METHODS & BACKGROUND