• Corpus ID: 235421942

Break-It-Fix-It: Unsupervised Learning for Program Repair

  title={Break-It-Fix-It: Unsupervised Learning for Program Repair},
  author={Michihiro Yasunaga and Percy Liang},
We consider repair tasks: given a critic (e.g., compiler) that assesses the quality of an input, the goal is to train a fixer that converts a bad example (e.g., code with syntax errors) into a good one (e.g., code with no syntax errors). Existing works create training data consisting of (bad, good) pairs by corrupting good examples using heuristics (e.g., dropping tokens). However, fixers trained on this synthetically-generated data do not extrapolate well to the real distribution of bad inputs… 

Figures and Tables from this paper

SelfAPR: Self-supervised Program Repair with Test Execution Diagnostics

SelfAPR correctly repairs 110 bugs from Defects4J, outperforming all the supervised learning repair approaches, and is implemented and evaluated in a systematic manner.

Learning to repair: Repairing model output errors after deployment using a dynamic memory of feedback

It is shown that the memory-enhancedFBN ET system, FBN ET, learns to apply user feedback effectively to repair such errors, while making a start at avoiding similar past mistakes on new, unseen examples.

Neural Program Repair : Systems, Challenges and Solutions

A literature review of latest NPR systems is undertaken to help interested readers understand advancements in this emerging field and to make the various NPR systems more understandable, they are split into a four-phase pipeline and various design choices for each phase.

Fix Bugs with Transformer through a Neural-Symbolic Edit Grammar

NSEdit per- 017 forms robustly when programs vary from pack- 018 ages to packages and when buggy programs are 019 concrete, and achieved a new state-of-the-art 015 accuracy on the Tufano small dataset 016 of the CodeXGLUE benchmark.

Repair Is Nearly Generation: Multilingual Program Repair with LLMs

This work introduces RING, a multilingual repair engine powered by a large language model trained on code (LLMC) such as Codex that enables a flipped model for programming assistance, one where the programmer writes code and the AI assistance suggests code, compared to traditional code suggestion technology.

BigIssue: A Realistic Bug Localization Benchmark

The introduction of BigIssue is proposed, a general benchmark for realistic bug localization and a motivation to improve bug localization capabilities of models through attention to the full repository context, to advance the state of the art in bug localization.

FixEval: Execution-based Evaluation of Program Fixes for Competitive Programming Problems

This work introduces F IX E VAL, a benchmark comprising of buggy code submissions to competitive programming problems and their respective fixes, and believes it provides a step towards real-world automatic bugfixing and model-generated code evaluation.

InCoder: A Generative Model for Code Infilling and Synthesis

INCODER is introduced, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling) and the ability to condition on bidirectional context substantially improves performance on challenging tasks such as type inference, comment generation, and variable re-naming.

On Distribution Shift in Learning-based Bug Detectors

This work proposes to train a bug detector in two phases, first on a synthetic bug distribution to adapt the model to the bug detection domain, and then on a real bug distributionTo drive the model towards the real distribution, which leverage a multi-task hierarchy, focal loss, and contrastive learning to further boost performance.

Neurosymbolic Repair for Low-Code Formula Languages

The development of LaMirage, a LAst-MIle RepAir-engine GEnerator that combines symbolic and neural techniques to perform last-mile repair in low-code formula languages, and the useability of the framework and design considerations are discussed.



DeepDelta: learning to repair compilation errors

A novel approach that automatically learns patterns with a deep neural network and suggests program repairs for the most costly classes of build-time compilation failures, namely missing symbols and mismatched method signatures is proposed.

SampleFix: Learning to Correct Programs by Sampling Diverse Fixes

A deep generative model to automatically correct programming errors by learning a distribution of potential fixes is proposed, formulated as a deep conditional variational autoencoder that samples diverse fixes for the given erroneous programs.

SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair

This paper devise, implement, and evaluate a technique, called SEQUENCER, for fixing bugs based on sequence-to-sequence learning on source code, which captures a wide range of repair operators without any domain-specific top-down design.

Learning to Fix Build Errors with Graph2Diff Neural Networks

This work presents a new deep learning architecture, called Graph2Diff, for automatically localizing and fixing build errors, which represents source code, build configuration files, and compiler diagnostic messages as a graph, and uses a Graph Neural Network model to predict a diff.

DeepFix: Fixing Common C Language Errors by Deep Learning

DeepFix is a multi-layered sequence-to-sequence neural network with attention which is trained to predict erroneous program locations along with the required correct statements and could fix 1881 programs completely and 1338 programs partially.

Graph-based, Self-Supervised Program Repair from Diagnostic Feedback

This work introduces a program-feedback graph, which connects symbols relevant to program repair in source code and diagnostic feedback, and then applies a graph neural network on top to model the reasoning process, and presents a self-supervised learning paradigm for program repair.

Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs

A learning-based approach to detect and fix a broad range of bugs in Javascript programs that targets bugs that are more complex and semantic in nature (i.e.~bugs that require adding or deleting statements to fix).

Neural Program Repair by Jointly Learning to Localize and Repair

It is beneficial to train a model that jointly and directly localizes and repairs variable-misuse bugs, and the experimental results show that the joint model significantly outperforms an enumerative solution that uses a pointer based model for repair alone.

DeepBugs: a learning approach to name-based bug detection

DeepBugs is presented, a learning approach to name-based bug detection, which reasons about names based on a semantic representation and which automatically learns bug detectors instead of manually writing them.

Patching as Translation: the Data and the Metaphor

It is demonstrated empirically that there are subtle, but critical distinctions between sequence-to-sequence models and translation model: while program repair benefits greatly from the former, general modeling architecture, it actually suffers from design decisions built into the latter, both in terms of translation accuracy and diversity.