Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction

  title={Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction},
  author={Beat Fluri and Michael W{\"u}rsch and Martin Pinzger and Harald C. Gall},
  journal={IEEE Transactions on Software Engineering},
A key issue in software evolution analysis is the identification of particular changes that occur across several versions of a program. We present change distilling, a tree differencing algorithm for fine-grained source code change extraction. For that, we have improved the existing algorithm by Chawathe et al. for extracting changes in hierarchically structured data. Our algorithm extracts changes by finding both a match between the nodes of the compared two abstract syntax trees and a minimum… 

Move-optimized source code tree differencing

  • Georg DotzlerM. Philippsen
  • Computer Science
    2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE)
  • 2016
5 general optimizations that can be added to state-of-the-art tree differencing algorithms to shorten the resulting edit scripts are presented and the novel Move-optimized Tree DIFFerencing algorithm (MTD-IFF) that has a higher accuracy in detecting moved code parts is presented.

Querying distilled code changes to extract executable transformations

This work introduces a tool-supported approach that identifies minimal executable subsequences in a sequence of distilled changes that implement a particular evolution pattern, specified in terms of intermediate states of the AST that undergoes each change.

Understanding Software Changes: Extracting, Classifying, and Presenting Fine-Grained Source Code Changes

  • Veit Frick
  • Computer Science
    2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)
  • 2020
This work plans to improve the accuracy and classification of the extracted source code changes and to extend them by analysing the fine-grained changes of source code dependencies, and proposes a dynamical analysis of the impact of the previously extracted changes on performance metrics.

An Empirical Study on the Characteristics of Python Fine-Grained Source Code Change Types

An automatic tool is implemented to extract 77 kinds of fine-grained source code change types from commit history information and provides useful guidance and insights for improving the understanding of source code evolution of dynamic language software.

Staged Tree Matching for Detecting Code Move across Files

This research proposes to construct a single abstract syntax tree from all source files included in a project and to perform a staged tree matching to detect across-file code moves efficiently and accurately.

Generating Accurate and Compact Edit Scripts Using Tree Differencing

The Iterative Java Matcher (IJM), builds upon GumTree and aims atgenerating more accurate and compact edit scripts that capture the developer's intent by improving the quality of the generated move and update actions.

The effect of IMPORT change in software change history

Experimental result shows that the IMPORT change meaningfully affects other changes and it would be better to consider IMPORTchange types in change analysis work.

Inferring Restructuring Operations on Logical Structure of Java Source Code

A technique of inferring restructuring operations on logical structure of Java source code by finding match candidates based on the similarity of element contents and identifying matches with Bayesian inference based on empirical data is presented.

Diff/TS: A Tool for Fine-Grained Structural Change Analysis

This paper reports on a tool for fine-grained analysis of structural changes made between revisions of programs, and presents several applications including software "archaeology'' on a widely known open source software project and automated "phylogenetic'' malware classification based on control flows.

Detecting Program Changes from Edit History of Source Code

A novel mechanism that automatically detects individual program changes and restores snapshots of the program from the history of edit operations for the target source code and compares class members that result from syntax analysis for respective snapshots is proposed.



Classifying Change Types for Qualifying Change Couplings

  • B. FluriH. Gall
  • Computer Science
    14th IEEE International Conference on Program Comprehension (ICPC'06)
  • 2006
This work developed an approach for analyzing and classifying change types based on code revisions and found out that in many cases large numbers of lines added and/or deleted are not accompanied by significant changes but small textual adaptations (such as indentation, etc.).

Dex: a semantic-graph differencing tool for studying changes in large code bases

An automated tool called Dex (difference extractor) for analyzing syntactic and semantic changes in large C-language code bases and the results of applying it to analyze bug fixes from the Apache and GCC projects are described.

Predicting source code changes by mining change history

An approach that applies data mining techniques to determine change patterns can be used to recommend potentially relevant source code to a developer performing a modification task and can reveal valuable dependencies by applying to the Eclipse and Mozilla open source projects.

Supporting source code difference analysis

The paper describes an approach to easily conduct analysis of source-code differences using meta-differencing to reflect the fact that additional knowledge of the differences can be automatically derived.

Using origin analysis to detect merging and splitting of source code entities

This paper discusses how extended origin analysis is used to aid in the detection of merging and splitting of files and functions in procedural code, and shows how reasoning about how call relationships have changed can aid a developer in locating where merges and splits have occurred.

Detecting similar Java classes using tree algorithms

Initial results of the technique indicate that it is indeed useful to identify similar Java classes, and it successfully identifies the ex ante and ex post versions of refactored classes and provides some interesting insights into within-version and between-version dependencies of classes within a Java project.

JDiff: A differencing technique and tool for object-oriented programs

This paper presents a technique for comparing object-oriented programs that identifies both differences and correspondences between two versions of a program, and presents the results of four empirical studies that show the efficiency and effectiveness of the technique when used on real programs.

Identifying Changed Source Code Lines from Version Repositories

This paper shows how the evolution of changes at source code line level can be inferred from CVS repositories, by combining information retrieval techniques and the Levenshtein edit distance.

UMLDiff: an algorithm for object-oriented design differencing

UMLDiff is presented, an algorithm for automatically detecting structural changes between the designs of subsequent versions of object-oriented software and enables subsequent design-evolution analyses from multiple perspectives in support of various evolution activities.

When functions change their names: automatic detection of origin relationships

This paper proposes an automated algorithm that identifies entity mapping at the function level across revisions even when an entity's name changes in the new revision, based on computing function similarities.