Finding and understanding bugs in C compilers

@inproceedings{Yang2011FindingAU,
  title={Finding and understanding bugs in C compilers},
  author={Xuejun Yang and Yang Chen and Eric Eide and John Regehr},
  booktitle={PLDI '11},
  year={2011}
}
Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our first… 

Figures and Tables from this paper

Configuring test generators using bug reports: a case study of GCC compiler and Csmith
TLDR
This work uses the code snippets in the GCC bugs reports about the language features that are more prone to inadequate implementation and using the insights to guide the test generators to provide higher coverage and triggers more miscompilation failures than the state-of-the-art test generation techniques for GCC.
Translation Validation for the LLVM Compiler
TLDR
Alive is extended to 1) reduce the SMT formula sizes generated to improve performance and 2) significantly increase analysis coverage with a new loop unrolling algorithm for loops written in Alive IR.
Finding missed compiler optimizations by differential testing
TLDR
This paper investigates whether the quality of generated code can be improved by comparing the code generated by different compilers to find optimizations performed by one but missed by another, and develops a set of tools for running tests.
K-CONFIG: Using Failing Test Cases to Generate Test Cases in GCC Compilers
TLDR
K-CONFIG is described, an approach that uses the bugs reported in the GCC repository to generate new test inputs that can trigger up to 36 miscompilation failures, and 179 crashes, while Csmith with the default configuration did not trigger any failures.
Finding typing compiler bugs
TLDR
A testing framework for validating static typing procedures in compilers and presents two novel approaches (type erasure mutation and type overwriting mutation) that apply targeted transformations to an input program to reveal type inference and soundness compiler bugs respectively.
Random testing for C and C++ compilers with YARPGen
TLDR
Yet Another Random Program Generator (YARPGen), a random test-case generator for C and C++ that was used to find and report more than 220 bugs in GCC, LLVM, and the Intel® C++ Compiler.
Finding Missed Compiler Optimizations by Differential Testing ∗ Gergö Barany
TLDR
This paper investigates whether the quality of generated code can be improved by comparing the code generated by different compilers to find optimizations performed by one but missed by another, and develops a set of tools for running tests.
How a simple bug in ML compiler could be exploited for backdoors?
TLDR
This study aims to show how a compiler-bug can be audited and possibly corrected, and shows that even old and mature compilers can present bugs.
Test-case reduction for C compiler bugs
TLDR
It is concluded that effective program reduction requires more than straightforward delta debugging, so three new, domain-specific test-case reducers are designed and implemented based on a novel framework in which a generic fixpoint computation invokes modular transformations that perform reduction operations.
Compiler fuzzing: how much does it matter?
TLDR
The first quantitative and qualitative study of the tangible impact of miscompilation bugs in a mature compiler is presented, and a selection of the syntactic changes caused by some of the bugs (fuzzer-found and non fuzzer- found) in package assembly code shows that either these changes have no semantic impact or that they would require very specific runtime circumstances to trigger execution divergence.
...
...

References

SHOWING 1-10 OF 33 REFERENCES
Random testing of C calling conventions
In a C compiler, function calls are difficult to implement correctly because they must respect a platform-specific calling convention. But they are governed by a simple invariant: parameters passed
Practical testing of a C99 compiler using output comparison
TLDR
A simple technique is presented for testing a C99 compiler, by comparing its output with the output from pre‐existing tools, which found several hundred bugs, mostly in in‐house code, but also in longstanding high‐quality front‐ and back‐end code from Edison Design Group and Apogee Software.
Compiler test case generation methods: a survey and assessment
Volatiles are miscompiled, and what to do about it
TLDR
Access summary testing is presented: an efficient, practical, and automatic way to detect code-generation errors related to the volatile qualifier, and a workaround is presented for the compiler defects discovered.
Automated test program generation for an industrial optimizing compiler
TLDR
The script-driven test program generation process in JTT is shown, and how to produce test programs automatically, based on a temporal-logic model of compiler optimizations, to guarantee the execution of optimizing modules under test during compilation.
Bringing extensibility to verified compilers
TLDR
Leroy's CompCert extension, XCert is extended, including the details of its execution engine and proof of correctness in Coq, and it is proved that this execution engine preserves program semantics, using the Coq proof assistant.
seL4: formal verification of an OS kernel
TLDR
To the knowledge, this is the first formal proof of functional correctness of a complete, general-purpose operating-system kernel.
An empirical study of the reliability of UNIX utilities
The following section describes the tools we built to test the utilities. These tools include the fuzz (random character) generator, ptyjig (to test interactive utilities), and scripts to automate
Grammar-based whitebox fuzzing
TLDR
Results of the experiments show that grammar-based whitebox fuzzing explores deeper program paths and avoids dead-ends due to non-parsable inputs and increased coverage of the code generation module of the IE7 JavaScript interpreter from 53% to 81% while using three times fewer tests.
...
...