Towards Informative Tagging of Code Fragments to Support the Investigation of Code Clones

  title={Towards Informative Tagging of Code Fragments to Support the Investigation of Code Clones},
  author={Daisuke Nishioka and Toshihiro Kamiya},
  journal={2021 IEEE 15th International Workshop on Software Clones (IWSC)},
Investigating the code fragments of code clones detected by code clone detection tools is a time-consuming task, especially when a large number of reference source files are available. This paper proposes (i) a method for clustering a clone class, which is detected by code clone detection tools using syntactic similarity, based on topic similarity by considering its code fragments as sequences of words and (ii) a method for assigning short tags to clusters of the clustering result. We also… 

Figures and Tables from this paper


Comparison and Evaluation of Clone Detection Tools
An experiment is presented that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC) and selects techniques that cover the whole spectrum of the state-of-the-art in clone detection.
A metric-based approach to identifying refactoring opportunities for merging code clones in a Java software system
It can be concluded that this method can efficiently merge code clones and give metrics that are indicators for certain refactoring methods rather than suggesting the refactored methods themselves.
Extracting code clones for refactoring using combinations of clone metrics
This paper proposes a method combining clone metrics to extract code clones for refactoring activity and conducts an empirical study on a web application developed by a Japanese software company, indicating that combinations of simple clone metric is more effective to extract refactors in detected code clones than individual clone metric.
CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code
A new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison, is proposed, which has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems.
Identification of high-level concept clones in source code
  • A. Marcus, J. Maletic
  • Computer Science
    Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001)
  • 2001
The intention of the approach is to enhance and augment existing clone detection methods that are based on structural analysis and improve the quality of clone detection.
SHINOBI: A Tool for Automatic Code Clone Detection in the IDE
SHINOBI, a novel code clone detection/modification tool that is designed to aid in recognizing and highlighting code clones during software maintenance tasks, is introduced.
Code Clone Detection on Specialized PDGs with Heuristics
  • Yoshiki Higo, S. Kusumoto
  • Computer Science
    2011 15th European Conference on Software Maintenance and Reengineering
  • 2011
The proposed PDG specializations and detection heuristics for enhancing PDG-based code clone detection are effective and shown to be effective by applying them to 4 open source systems.
Gemini: maintenance support environment based on code clone analysis
A maintenance support environment, called Gemini, is developed, which visualizes the code clone information from a code clone detection tool, CCFinder, and can specify a set of distinctive code clones through the GUI, and refer the fragments of source code corresponding to the clone on the plot or graph.
Scalable detection of semantic clones
This paper efficiently solve the tree similarity problem to create a scalable analysis that locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments.
SourcererCC: Scaling Code Clone Detection to Big-Code
This paper presents a token-based clone detector, SourcererCC, that can detect both exact and near-miss clones from large inter-project repositories using a standard workstation, and evaluates the scalability, execution time, recall and precision, and compares it to four publicly available and state-of-the-art tools.