BEARS: An Extensible Java Bug Benchmark for Automatic Program Repair Studies

  title={BEARS: An Extensible Java Bug Benchmark for Automatic Program Repair Studies},
  author={Fernanda Madeiral Delfim and Simon Urli and Marcelo de Almeida Maia and Monperrus Martin},
  journal={2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER)},
Benchmarks of bugs are essential to empirically evaluate automatic program repair tools. In this paper, we present BEARS, a project for collecting and storing bugs into an extensible bug benchmark for automatic repair studies in Java. The collection of bugs relies on commit building state from Continuous Integration (CI) to find potential pairs of buggy and patched program versions from open-source projects hosted on GitHub. Each pair of program versions passes through a pipeline where an… 

Figures and Tables from this paper

BugBuilder: An Automated Approach to Building Bug Repository

This paper proposes an automatic approach, called BugBuilder, to construct bug repositories from version control systems, which automatically extracts complete and concise bug-fixing patches and excludes bug-irrelevant changes and built a bug repository, called GrowingBugs, with the proposed approach.

Should We Add Repair Time to an Unfixed Bug?

VANFIX is designed, a simple and effective repair method for small-scale C programs that leverages the probability of exploring the search space to conduct a variable search neighborhood for potential patches, rather than patching suspicious statements one by one.

A large-scale study on human-cloned changes for automated program repair

This paper analyzes 3,049 multi-hunk patches from the ManySStuBs4J dataset and concludes that automated solutions for creating patches composed of identical or similar changes can be useful for fixing bugs.

Extracting Concise Bug-Fixing Patches from Human-Written Patches in Version Control Systems

This paper proposes an automatic approach, called BugBuilder, to extracting complete and concise bug-fixing patches from human-written patches in version control systems, and suggests that its precision was even higher than human experts.

A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark

This paper performs an automatic repair experiment on a benchmark called QuixBugs that has never been studied in the context of program repair, and proposes three patch correctness assessment techniques to comprehensively study overfitting and incorrect patches.

Constructing Regression Dataset from Code Evolution History

This work addresses the challenges of identifying potential regression-fixing commits from the code evolution history, migrating the test and its code dependencies over the history, and minimizing the compilation overhead during the regression search, and builds the largest replicable regression dataset within shortest period.

A Comprehensive Study of Code-removal Patches in Automated Program Repair

It is revealed that code-removal patches are often insufficient to fix bugs, and a comprehensive taxonomy of code- Removal patches is proposed that provides evidence of the problems that may affect test suites, opening new opportunities for researchers in the field of automatic program repair.

Empirical review of Java program repair tools: a large-scale experiment on 2,141 bugs and 23,551 repair attempts

A large-scale experiment using 11 Java test-suite-based repair tools and 2,141 bugs from 5 benchmarks is presented to have a better understanding of the current state of automatic program repair tools on a large diversity of benchmarks.

On the Efficiency of Test Suite based Program Repair A Systematic Assessment of 16 Automated Repair Systems for Java Programs

A large-scale empirical study on the efficiency, in terms of quantity of generated patch candidates of the 16 open-source repair tools for Java programs, and notes that current template-based repair systems are actually least efficient as they tend to generate majoritarily irrelevant patch candidates.



Defects4J: a database of existing faults to enable controlled testing studies for Java programs

Defects4J, a database and extensible framework providing real bugs to enable reproducible studies in software testing research, and provides a high-level interface to common tasks in softwareTesting research, making it easy to con- duct and reproduce empirical studies.

Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs

Empirical analysis on Nopol shows that the approach can effectively fix bugs with buggy if conditions and missing preconditions on two large open-source projects, namely Apache Commons Math and Apache Commons Lang.

BugBench: Benchmarks for Evaluating Bug Detection Tools

This paper summarizes the general guidelines on the criteria for selecting representative bug benchmarks, and the metrics for evaluating a bug detection tool, and presents a set of buggy applications collected by us, with various types of software bugs.

Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools

Codeflaws, a set of 3902 defects from 7436 programs automatically classified across 39 defect classes, is presented, referring to different types of fault as defect classes derived from the syntactic differences between a buggy program and a patched program.

Extraction of bug localization benchmarks from history

iBUGS is presented, an approach that semiautomatically extracts benchmarks for bug localization from the history of a project and demonstrates the relevance of the dataset with a case study on the bug localization tool AMPLE.

The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs

The need for a new set of benchmarks is outlined, requirements are outlined, and two datasets, ManyBugs and IntroClass, consisting between them of 1,183 defects in 15 C programs are presented, designed to support the comparative evaluation of automatic repair algorithms asking a variety of experimental questions.

Dissection of a bug dataset: Anatomy of 395 patches from Defects4J

This work deeply study 395 patches of the Defects4J dataset and extracts a set of properties that can be used to characterize and compare different bug datasets.

How to Design a Program Repair Bot? Insights from the Repairnator Project

The Repairnator bot is an autonomous agent that constantly monitors test failures, reproduces bugs, and runs program repair tools against each reproduced bug.

ASTOR: a program repair library for Java (demo)

Astor is a publicly available program repair library that includes the implementation of three notable repair approaches (jGenProg, jKali and jMutRepair) and is envision that the research community will use Astor for setting up comparative evaluations and explore the design space of automatic repair for Java.

Automatic Software Repair: a Bibliography

A novel and structured overview of the diversity of bug oracles and repair operators used in the literature is provided, with techniques such as checkpoint and restart, reconfiguration, invariant restoration.