How different are different diff algorithms in Git?

@article{Nugroho2019HowDA,
  title={How different are different diff algorithms in Git?},
  author={Yusuf Sulistyo Nugroho and Hideaki Hata and Ken-ichi Matsumoto},
  journal={Empirical Software Engineering},
  year={2019},
  volume={25},
  pages={790 - 823}
}
Automatic identification of the differences between two versions of a file is a common and basic task in several applications of mining code repositories. Git, a version control system, has a diff utility and users can select algorithms of diff from the default algorithm Myers to the advanced Histogram algorithm. From our systematic mapping, we identified three popular applications of diff in recent studies. On the impact on code churn metrics in 14 Java projects, we obtained different values… 
An empirical study on the use of SZZ for identifying inducing changes of non-functional bugs
TLDR
This paper conducts an empirical study on the results of the SZZ approach when used to identify the inducing changes of the non-functional bugs in the NFBugs dataset, and finds that prior criteria may be irrelevant for non- functional bugs.
Just-In-Time Defect Identification and Localization: A Two-Phase Framework
TLDR
A JIT defect localization approach that leverages software naturalness with the N-gram model is proposed that achieves a reasonable performance, and outperforms the two baselines by a substantial margin in terms of four ranking measures.
Learning to Generate Corrective Patches using Neural Machine Translation
TLDR
This paper proposes Ratchet, a corrective patch generation system using neural machine translation, and shows that Ratchet can generate syntactically valid statements 98.7% of the time, and achieve an F1-measure between 0.41-0.83 with respect to the actual fixes adopted in the code base.
Software evolution: the lifetime of fine-grained elements
A model regarding the lifetime of individual source code lines or tokens can estimate maintenance effort, guide preventive maintenance, and, more broadly, identify factors that can improve the
Science-Software Linkage: The Challenges of Traceability between Scientific Knowledge and Software Artifacts
TLDR
The state of the practice of linking research papers and associated source code is summarized, highlighting the recent efforts towards creating and maintaining such links and outlining challenges related to traceability and opportunities for overcoming these challenges.
SATDBailiff- Mining and Tracking Self-Admitted Technical Debt
TLDR
SATDBailiff is a tool that uses an existing state-of-the-art SATD detection tool, to identify SATD in method comments, then properly track their lifespan, and provides researchers and practitioners in better tracking SATDs instances, and providing them with a reliable tool that can be easily extended.
Context-aware Retrieval-based Deep Commit Message Generation
TLDR
CoRec is a context-aware encoder-decoder model that randomly selects the previous output of the decoder or the embedding vector of a ground truth word as context to make the model gradually aware of previous alignment choices and uses the retrieval diff to guide the probability distribution for the final generated vocabulary.
Deployment of a change‐level software defect prediction solution into an industrial setting
TLDR
This work empirically assess the online SDP's performance with various lengths of the time gap between the train and test set and model update periods, and investigates whether an “offline” SDP could reflect its “online” (real‐life) performance, and other deployment decisions: the model re‐training process and update period.
Lock-Free Collaboration Support for Cloud Storage Services with Operation Inference and Transformation
TLDR
This paper designs intelligent approaches to the inference and transformation of users’ editing operations, as well as optimizations to the maintenance of files’ historic versions, and builds an open-source system UFC2 (User-Friendly Collaborative Cloud) to embody this design.
Visualization of program development process
  • V. Shynkarenko, O. Zhevago
  • Computer Science
    2019 IEEE 14th International Conference on Computer Sciences and Information Technologies (CSIT)
  • 2019
TLDR
The extension for Visual Studio, which monitors the process of programming, is developed and the teacher gets an opportunity of visual monitoring of the program development process, which means the active participation in the formation of an effective student programming style.
...
1
2
...

References

SHOWING 1-10 OF 40 REFERENCES
The Uniqueness of Changes: Characteristics and Applications
TLDR
This paper presents a definition of unique changes and provides a method for identifying them in software project history and explores how prevalent unique changes are and investigate where they occur along the architecture of the project.
Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction
TLDR
The change distilling algorithm is presented, a tree differencing algorithm for fine-grained source code change extraction that approximates the minimum edit script 45 percent better than the original change extraction approach by Chawathe et al.
Diff/TS: A Tool for Fine-Grained Structural Change Analysis
TLDR
This paper reports on a tool for fine-grained analysis of structural changes made between revisions of programs, and presents several applications including software "archaeology'' on a widely known open source software project and automated "phylogenetic'' malware classification based on control flows.
Comparing text‐based and dependence‐based approaches for determining the origins of bugs
TLDR
Both the text approach and the dependence approach were partially successful across a variety of bugs and suggested the precise definition of program dependence could affect performance, as could whether the approaches identified a single or multiple origins.
Mining Software Repositories for Accurate Authorship
TLDR
Two new line-level authorship models are presented to overcome the limitation of current tools that assume that the last developer to change a line of code is its author regardless of all earlier changes.
Move-optimized source code tree differencing
  • Georg Dotzler, M. Philippsen
  • Computer Science
    2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE)
  • 2016
TLDR
5 general optimizations that can be added to state-of-the-art tree differencing algorithms to shorten the resulting edit scripts are presented and the novel Move-optimized Tree DIFFerencing algorithm (MTD-IFF) that has a higher accuracy in detecting moved code parts is presented.
An Algorithm for Differential File Comparison
TLDR
The program diff reports differences between two files, expressed as a minimal list of line changes to bring either file into agreement with the other, based on ideas from several sources.
ClDiff: Generating Concise Linked Code Differences
TLDR
The goal of ClDiff is to generate concise linked code differences whose granularity is in between the existing code differencing and code change summarization methods, to generate more easily understandable code differences.
Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities
TLDR
This work investigated whether software metrics obtained from source code and development history are discriminative and predictive of vulnerable code locations, and predicted over 80 percent of the known vulnerable files with less than 25 percent false positives for both projects.
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bug-Introducing Changes
TLDR
The proposed framework provides a systematic mean for evaluating the data that is generated by a given SZZ implementation and finds that current SZZ implementations still lack mechanisms to accurately identify bug-introducing changes.
...
1
2
3
4
...