On the Techniques We Create, the Tools We Build, and Their Misalignments: A Study of KLEE

@article{Rizzi2016OnTT,
  title={On the Techniques We Create, the Tools We Build, and Their Misalignments: A Study of KLEE},
  author={Eric F. Rizzi and Sebastian G. Elbaum and Matthew B. Dwyer},
  journal={2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)},
  year={2016},
  pages={132-143}
}
Our community constantly pushes the state-of-the-art by introducing “new” techniques. These techniques often build on top of, and are compared against, existing systems that realize previously published techniques. The underlying assumption is that existing systems correctly represent the techniques they implement. This pa- per examines that assumption through a study of KLEE, a popular and well-cited tool in our community. We briefly describe six improvements we made to KLEE, none of which can… 

Figures and Tables from this paper

Automatic Software Repair: A Survey
TLDR
A new class of approaches, namely program repair techniques, whose key idea is to try to automatically repair software systems by producing an actual fix that can be validated by the testers before it is finally accepted, or that is adapted to properly fit the system.
Reliable benchmarking: requirements and solutions
Benchmarking is a widely used method in experimental computer science, in particular, for the comparative evaluation of tools and algorithms. As a consequence, a number of questions need to be
Exploring and exploiting the correlations between bug-inducing and bug-fixing commits
TLDR
The empirical findings reveal important and significant correlations between a bug's inducing and fixing commits and explain why the SZZ algorithm, the most widely-adopted approach to collecting bug-inducing commits, is imprecise.
Comparing developer-provided to user-provided tests for fault localization and automated program repair
TLDR
Evidence that developer- provided tests are more targeted toward the defect and encode more information than user-provided tests is provided, and suggestions for improving the design and evaluation of fault localization and automated program repair techniques are provided.
Input Test Suites for Program Repair: A Novel Construction Method Based on Metamorphic Relations
TLDR
This article proposes a novel method of constructing the APR input test suites, using information derived from violated metamorphic relations, and empirically evaluates this approach with random and code-coverage-based construction methods that are used as the experimental control.
2 Fuzzing the Heartbleed-Introducing Source Code Commit
TLDR
Directed Greybox Fuzzing is introduced which generates inputs with the objective of reaching a given set of target program locations efficiently and shows applications of DGF to patch testing and crash reproduction, and the integration of AFLGo into Google’s continuous fuzzing platform OSS-Fuzz is discussed.
Systematic comparison of symbolic execution systems: intermediate representation and its generation
TLDR
This work develops a methodology for systematic comparison of different approaches to symbolic execution, and uses it to evaluate the impact of the choice of IR and IR generation.
A review of software engineering research from a design science perspective
TLDR
The design science lens helps to pinpoint the theoretical contribution of a research output, which is the core for assessing the practical relevance and novelty of the prescribed rule as well as the rigor of applied empirical methods in support of the rule.
Coverage-Based Greybox Fuzzing as Markov Chain
TLDR
AFLFast is compared to the symbolic executor Klee in terms of vulnerability detection and code coverage, and AFLFast only slightly outperforms Klee while a combination of both tools achieves best results by mitigating the individual weaknesses.
Directed Greybox Fuzzing
TLDR
This paper introduces Directed Greybox Fuzzing (DGF) which generates inputs with the objective of reaching a given set of target program locations efficiently, and develops and evaluates a simulated annealing-based power schedule that gradually assigns more energy to seeds that are closer to the target locations while reducing energy for Seeds that are further away.
...
1
2
...

References

SHOWING 1-10 OF 103 REFERENCES
MintHint: automated synthesis of repair hints
TLDR
MintHint is a novel technique for program repair that is a departure from most of today’s approaches, but instead of trying to fully automate program repair, it performs statistical correlation analysis to identify expressions that are likely to occur in the repaired code and generates repair hints from these expressions.
Sahara: Guiding the debugging of failed software upgrades
TLDR
It is argued that failed upgrade debugging can be simplified by exploiting the characteristics of upgrade problems to prioritize the set of routines to consider, and design and implement Sahara, a system that identifies the aspects of the environment that are most likely the culprits of the misbehavior and finds the subset of routines that relate to those aspects.
KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs
TLDR
A new symbolic execution tool, KLEE, capable of automatically generating tests that achieve high coverage on a diverse set of complex and environmentally-intensive programs, and significantly beat the coverage of the developers' own hand-written test suite is presented.
Repeatability and Benefaction in Computer Systems Research — A Study and a Modest Proposal
TLDR
A novel sharing specification scheme is proposed that requires researchers to specify the level of sharing that reviewers and readers can assume from a paper.
Revealing and repairing configuration inconsistencies in large-scale system software
TLDR
This work proposes an approach that extracts variability from both C Preprocessor and configuration models into propositional logic, which reveals inconsistencies between variability as expressed by the Cpreprocessor and an explicit variability model, which manifest themselves in seemingly conditional code that is in fact unconditional.
A Synergistic Analysis Method for Explaining Failed Regression Tests
TLDR
A new automated debugging method for regression testing based on a synergistic application of both dynamic and semantic analysis that iteratively applies both dynamic analysis and a constraint solver based semantic analysis to leverage their complementary strengths is proposed.
iTree: Efficiently Discovering High-Coverage Configurations Using Interaction Trees
TLDR
The improved iTree algorithm is highly scalable and can identify a high-coverage test set of configurations more effectively than existing methods, and the key improvements are based on the use of composite proto-interactions - a construct that improves iTree's ability to correctly learn key configuration option combinations.
Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact
TLDR
The infrastructure that is being designed and constructed to support controlled experimentation with testing and regression testing techniques is described and the impact that this infrastructure has had and can be expected to have.
BugRedux: Reproducing field failures for in-house debugging
  • Wei Jin, A. Orso
  • Computer Science
    2012 34th International Conference on Software Engineering (ICSE)
  • 2012
TLDR
The results are promising in that they show that it is possible to synthesize in-house executions that reproduce failures observed in the field using a suitable set of execution data.
Selecting peers for execution comparison
TLDR
Five different existing techniques for finding peers and their impact on 20 real bugs are implemented and a metric to evaluate the quality of the peers is presented, based on the similarity of the peer to the executions of the patched programs.
...
1
2
3
4
5
...