How different are different diff algorithms in Git?

@article{Nugroho2019HowDA,
  title={How different are different diff algorithms in Git?},
  author={Yusuf Sulistyo Nugroho and Hideaki Hata and Ken-ichi Matsumoto},
  journal={Empirical Software Engineering},
  year={2019},
  volume={25},
  pages={790 - 823}
}
Automatic identification of the differences between two versions of a file is a common and basic task in several applications of mining code repositories. Git, a version control system, has a diff utility and users can select algorithms of diff from the default algorithm Myers to the advanced Histogram algorithm. From our systematic mapping, we identified three popular applications of diff in recent studies. On the impact on code churn metrics in 14 Java projects, we obtained different values… Expand
An empirical study on the use of SZZ for identifying inducing changes of non-functional bugs
TLDR
This paper conducts an empirical study on the results of the SZZ approach when used to identify the inducing changes of the non-functional bugs in the NFBugs dataset, and finds that prior criteria may be irrelevant for non- functional bugs. Expand
Just-In-Time Defect Identification and Localization: A Two-Phase Framework
TLDR
JIT defect localization is the next step of JIT defect identification (i.e., after a buggy change is identified, suspicious source code lines are identified) and is referred to as “Just-In-Time (JIT) Defect localization”. Expand
Learning to Generate Corrective Patches using Neural Machine Translation
TLDR
This paper proposes Ratchet, a corrective patch generation system using neural machine translation, and shows that Ratchet can generate syntactically valid statements 98.7% of the time, and achieve an F1-measure between 0.41-0.83 with respect to the actual fixes adopted in the code base. Expand
Software evolution: the lifetime of fine-grained elements
A model regarding the lifetime of individual source code lines or tokens can estimate maintenance effort, guide preventive maintenance, and, more broadly, identify factors that can improve theExpand
Does Refactoring Break Tests and to What Extent?
Refactoring as a process is aimed at improving the quality of a software system while preserving its external behavior. In practice, refactoring comes in the form of many specific and diverseExpand
Science-Software Linkage: The Challenges of Traceability between Scientific Knowledge and Software Artifacts
TLDR
The state of the practice of linking research papers and associated source code is summarized, highlighting the recent efforts towards creating and maintaining such links and outlining challenges related to traceability and opportunities for overcoming these challenges. Expand
SATDBailiff- Mining and Tracking Self-Admitted Technical Debt
TLDR
SATDBailiff is a tool that uses an existing state-of-the-art SATD detection tool, to identify SATD in method comments, then properly track their lifespan, and provides researchers and practitioners in better tracking SATDs instances, and providing them with a reliable tool that can be easily extended. Expand
Context-aware Retrieval-based Deep Commit Message Generation
TLDR
CoRec is a context-aware encoder-decoder model that randomly selects the previous output of the decoder or the embedding vector of a ground truth word as context to make the model gradually aware of previous alignment choices and uses the retrieval diff to guide the probability distribution for the final generated vocabulary. Expand
Deployment of a change‐level software defect prediction solution into an industrial setting
Applying change‐level software defect prediction (SDP) in practice has several challenges regarding model validation techniques, data accuracy, and prediction performance consistency. A few studiesExpand
Lock-Free Collaboration Support for Cloud Storage Services with Operation Inference and Transformation
TLDR
This paper designs intelligent approaches to the inference and transformation of users’ editing operations, as well as optimizations to the maintenance of files’ historic versions, and builds an open-source system UFC2 (User-Friendly Collaborative Cloud) to embody this design. Expand
...
1
2
...

References

SHOWING 1-10 OF 40 REFERENCES
The Uniqueness of Changes: Characteristics and Applications
TLDR
This paper presents a definition of unique changes and provides a method for identifying them in software project history and explores how prevalent unique changes are and investigate where they occur along the architecture of the project. Expand
Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction
TLDR
The change distilling algorithm is presented, a tree differencing algorithm for fine-grained source code change extraction that approximates the minimum edit script 45 percent better than the original change extraction approach by Chawathe et al. Expand
Diff/TS: A Tool for Fine-Grained Structural Change Analysis
TLDR
This paper reports on a tool for fine-grained analysis of structural changes made between revisions of programs, and presents several applications including software "archaeology'' on a widely known open source software project and automated "phylogenetic'' malware classification based on control flows. Expand
Comparing text‐based and dependence‐based approaches for determining the origins of bugs
TLDR
Both the text approach and the dependence approach were partially successful across a variety of bugs and suggested the precise definition of program dependence could affect performance, as could whether the approaches identified a single or multiple origins. Expand
Mining Software Repositories for Accurate Authorship
TLDR
Two new line-level authorship models are presented to overcome the limitation of current tools that assume that the last developer to change a line of code is its author regardless of all earlier changes. Expand
Move-optimized source code tree differencing
  • Georg Dotzler, M. Philippsen
  • Computer Science
  • 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE)
  • 2016
TLDR
5 general optimizations that can be added to state-of-the-art tree differencing algorithms to shorten the resulting edit scripts are presented and the novel Move-optimized Tree DIFFerencing algorithm (MTD-IFF) that has a higher accuracy in detecting moved code parts is presented. Expand
An Algorithm for Differential File Comparison
TLDR
The program diff reports differences between two files, expressed as a minimal list of line changes to bring either file into agreement with the other, based on ideas from several sources. Expand
ClDiff: Generating Concise Linked Code Differences
TLDR
The goal of ClDiff is to generate concise linked code differences whose granularity is in between the existing code differencing and code change summarization methods, to generate more easily understandable code differences. Expand
Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities
TLDR
This work investigated whether software metrics obtained from source code and development history are discriminative and predictive of vulnerable code locations, and predicted over 80 percent of the known vulnerable files with less than 25 percent false positives for both projects. Expand
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bug-Introducing Changes
TLDR
The proposed framework provides a systematic mean for evaluating the data that is generated by a given SZZ implementation and finds that current SZZ implementations still lack mechanisms to accurately identify bug-introducing changes. Expand
...
1
2
3
4
...