A large-scale study on human-cloned changes for automated program repair

  title={A large-scale study on human-cloned changes for automated program repair},
  author={Fernanda Madeiral and Thomas Durieux},
  journal={2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)},
Research in automatic program repair has shown that real bugs can be automatically fixed. However, there are several challenges involved in such a task that are not yet fully addressed. As an example, consider that a test-suite-based repair tool performs a change in a program to fix a bug spotted by a failing test case, but then the same or another test case fails. This could mean that the change is a partial fix for the bug or that another bug was manifested. However, the repair tool discards… 
2 Citations

Figures and Tables from this paper

CVEfixes: automated collection of vulnerabilities and their fixes from open-source software
This work proposes a method to automatically collect and curate a comprehensive vulnerability dataset from Common Vulnerabilities and Exposures records in the public National Vulnerability Database (NVD), and shares an initial release of the resulting vulnerability dataset named CVEfixes.
Megadiff: A Dataset of 600k Java Source Code Changes Categorized by Diff Size
Megadiff is presented, a dataset of source code diffs that can be used for research on commit comprehension, fault localization, automated program repair, and machine learning on code changes.


BEARS: An Extensible Java Bug Benchmark for Automatic Program Repair Studies
BEARS, a project for collecting and storing bugs into an extensible bug benchmark for automatic repair studies in Java, is presented, and the version 1.0 of BEARS is delivered, which contains 251 reproducible bugs collected from 72 projects that use the Travis CI and Maven build environment.
Towards an automated approach for bug fix pattern detection
PPD, a detector of repair patterns in patches, which performs source code change analysis at abstract-syntax tree level, is designed and implemented and evaluated and it is concluded that PPD has the potential to detect as many repair patterns as human manual analysis.
Defects4J: a database of existing faults to enable controlled testing studies for Java programs
Defects4J, a database and extensible framework providing real bugs to enable reproducible studies in software testing research, and provides a high-level interface to common tasks in softwareTesting research, making it easy to con- duct and reproduce empirical studies.
Harnessing Evolution for Multi-Hunk Program Repair
This work presents a novel APR technique that generalizes single-hunk repair techniques to include an important class of multi-hunks bugs, namely bugs that may require applying a substantially similar patch at a number of locations.
How Often Do Single-Statement Bugs Occur?: The ManySStuBs4J Dataset
A dataset of 153,652 single statement bug-fix changes mined from 1,000 popular open-source Java projects, annotated by whether they match any of a set of 16 bug templates, inspired by state-of-the-art program repair techniques is provided to prove a resource for both future work in program repair and studies in empirical software engineering.
A Survey on Software Clone Detection Research
The state of the art in clone detection research is surveyed, the clone terms commonly used in the literature are described along with their corresponding mappings to the commonly used clone types and several open problems related to clone detectionResearch are pointed out.
Automatic Software Repair: a Bibliography
A novel and structured overview of the diversity of bug oracles and repair operators used in the literature is provided, with techniques such as checkpoint and restart, reconfiguration, invariant restoration.
Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis
Angelix is a novel semantics- based repair method that scales up to programs of similar size as are handled by search-based repair tools such as GenProg and SPR, and is more scalable than previously proposed semantics based repair methods such as SemFix and DirectFix.
Feature-based detection of bugs in clones
How far certain features of clones can be used to automatically identify incomplete bugfixes is described, relevant for developers to locate incomplete bugfix-that is, defects still existing in the system-and for us as clone researchers to quickly find examples that motivate the use of clone management.
Dissection of a bug dataset: Anatomy of 395 patches from Defects4J
This work deeply study 395 patches of the Defects4J dataset and extracts a set of properties that can be used to characterize and compare different bug datasets.