To Run What No One Has Run Before

Abstract

When program verification fails, it is often hard to understand what went wrong in the absence of concrete executions that expose parts of the implementation or specification responsible for the failure. Automatic generation of such tests would require " executing " the complex specifications typically used for verification (with unbounded quantification and other expressive constructs), something beyond the capabilities of standard testing tools. This paper presents a technique to automatically generate executions of programs annotated with complex specifications, and its implementation for the Boogie intermediate verification language. Our approach combines symbolic execution and SMT constraint solving to generate small tests that are easy to read and understand. The evaluation on several program verification examples demonstrates that our test case generation technique can help understand failed verification attempts in conditions where traditional testing is not applicable, thus making formal verification techniques easier to use in practice. Static program verification has made tremendous progress, and is now being applied to real programs [15,10] well beyond the scale of " toy " examples. These achievements are impressive, but still require massive efforts and highly-trained experts. One of the biggest remaining obstacles is understanding failed verification attempts [18]. Most difficulties in this area stem from inherent limits of static verification, and hence could benefit from complementary dynamic techniques. Static program proving techniques—implemented in tools such as Boogie [16], Dafny [17], and VeriFast [7]—are necessarily incomplete, since they target undecid-able problems. Incompleteness implies that program verifiers are " best effort " : when they fail, it is no conclusive evidence of error. It may as well be that the specification is sound but insufficient to prove the implementation correct; for example, a loop invariant may be too weak to establish the postcondition. Even leaving the issue of incomplete specifications aside, the feedback provided by failed verification attempts is often of little use to understand the ultimate source of failure. A typical error message states that some executions might violate a certain assertion but, without concrete input values that trigger the violation, it is difficult to understand which parts of the programs should be adjusted. And even when verification is successful, it would still be useful to have " sanity checks " in the form of concrete executions, to increase confidence that the written specification is not only consistent but sufficiently detailed to capture the intended program behavior.

5 Figures and Tables

Showing 1-2 of 2 extracted citations