Fine-grained and accurate source code differencing

  title={Fine-grained and accurate source code differencing},
  author={Jean-R{\'e}my Falleri and Flor{\'e}al Morandat and Xavier Blanc and Matias Martinez and Monperrus Martin},
  journal={Proceedings of the 29th ACM/IEEE international conference on Automated software engineering},
At the heart of software evolution is a sequence of edit actions, called an edit script, made to a source code file. Since software systems are stored version by version, the edit script has to be computed from these versions, which is known as a complex task. Existing approaches usually compute edit scripts at the text granularity with only add line and delete line actions. However, inferring syntactic changes from such an edit script is hard. Since moving code is a frequent action performed… 

Figures and Tables from this paper

Generating simpler AST edit scripts by considering copy-and-paste

This paper proposes to consider copy-and-paste as a kind of editing action forming tree-based edit script, which is an editing sequence that transforms a tree to another one, which means making simpler edit scripts but also making edit scripts closer to developers' actual editing sequences.

Move-optimized source code tree differencing

  • Georg DotzlerM. Philippsen
  • Computer Science
    2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE)
  • 2016
5 general optimizations that can be added to state-of-the-art tree differencing algorithms to shorten the resulting edit scripts are presented and the novel Move-optimized Tree DIFFerencing algorithm (MTD-IFF) that has a higher accuracy in detecting moved code parts is presented.

Generating Accurate and Compact Edit Scripts Using Tree Differencing

The Iterative Java Matcher (IJM), builds upon GumTree and aims atgenerating more accurate and compact edit scripts that capture the developer's intent by improving the quality of the generated move and update actions.

Beyond GumTree: A Hybrid Approach to Generate Edit Scripts

This research proposes to generate easier-to-understand ESs by using not only structures of AST but also information of line differences, and confirmed that ESs generated by this methodology are more helpful to understand the differences of source code than GumTree.

Staged Tree Matching for Detecting Code Move across Files

This research proposes to construct a single abstract syntax tree from all source files included in a project and to perform a staged tree matching to detect across-file code moves efficiently and accurately.

Querying distilled code changes to extract executable transformations

This work introduces a tool-supported approach that identifies minimal executable subsequences in a sequence of distilled changes that implement a particular evolution pattern, specified in terms of intermediate states of the AST that undergoes each change.

ChangeMacroRecorder: Accurate Recording of Fine-Grained Textual Changes of Source Code

ChangeMacroRecorder is proposed that automatically and silently records all textual changes of source code and in real time correlates those textual changes with actions causing them while a programmer is writing and modify- ing it on the Eclipse’s Java editor.

Understanding Software Changes: Extracting, Classifying, and Presenting Fine-Grained Source Code Changes

  • Veit Frick
  • Computer Science
    2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)
  • 2020
This work plans to improve the accuracy and classification of the extracted source code changes and to extend them by analysing the fine-grained changes of source code dependencies, and proposes a dynamical analysis of the impact of the previously extracted changes on performance metrics.

Inferring and Applying Type Changes

TC-Infer is introduced, a novel technique that infers rewrite rules that capture the required adaptations from the version histories of open source projects and is shown to be highly effective at applying type changes.

A structural model for contextual code changes

A powerful and lightweight neural model is presented that achieves a 28% relative gain over state-of-the-art sequential models and 2× higher accuracy than syntactic models that learn to generate the edited code, as opposed to modeling the edits directly.



Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction

The change distilling algorithm is presented, a tree differencing algorithm for fine-grained source code change extraction that approximates the minimum edit script 45 percent better than the original change extraction approach by Chawathe et al.

Dex: a semantic-graph differencing tool for studying changes in large code bases

An automated tool called Dex (difference extractor) for analyzing syntactic and semantic changes in large C-language code bases and the results of applying it to analyze bug fixes from the Apache and GCC projects are described.

Diff/TS: A Tool for Fine-Grained Structural Change Analysis

This paper reports on a tool for fine-grained analysis of structural changes made between revisions of programs, and presents several applications including software "archaeology'' on a widely known open source software project and automated "phylogenetic'' malware classification based on control flows.

Using origin analysis to detect merging and splitting of source code entities

This paper discusses how extended origin analysis is used to aid in the detection of merging and splitting of files and functions in procedural code, and shows how reasoning about how call relationships have changed can aid a developer in locating where merges and splits have occurred.

A fast abstract syntax tree interpreter for R

This paper tries to see how far one can push a naive implementation while remaining portable and not requiring expertise in compilers and runtime systems.

diffX: an algorithm to detect changes in multi-version XML documents

The diffX algorithm for detecting changes between two versions of an XML document is presented, in order to optimize the runtime of mapping the nodes between the two versions and to minimize the size of the edit script.

A differencing algorithm for object-oriented programs

This work presents a technique for comparing object-oriented programs that identifies both differences and correspondences between two versions of a program and presents empirical results that show the efficiency and effectiveness of the technique on a real program.

Clone Management for Evolving Software

This paper introduces JSync, a novel clone management tool that represents source code and clones as (sub)trees in Abstract Syntax Trees, measures code similarity based on structural characteristic vectors, and describes code changes as tree editing scripts.

AURA: a hybrid approach to identify framework evolution

AURA, a novel hybrid approach that combines call dependency and text similarity analyses to overcome limitations of one-replaced-by-many and many-re replaced- by-one methods, is introduced.

Syntax tree fingerprinting for source code similarity detection

This paper presents a simple and scalable architecture based on AST fingerprinting that efficiently indexes AST representations in a database, that quickly detects exact (w.r.t source code abstraction) clone clusters and that easily retrieves their corresponding ASTs.