Oreo: detection of clones in the twilight zone
@article{Saini2018OreoDO, title={Oreo: detection of clones in the twilight zone}, author={Vaibhav Saini and Farima Farmahinifarahani and Yadong Lu and Pierre Baldi and Cristina V. Lopes}, journal={Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering}, year={2018} }
Source code clones are categorized into four types of increasing difficulty of detection, ranging from purely textual (Type-1) to purely semantic (Type-4). Most clone detectors reported in the literature work well up to Type-3, which accounts for syntactic differences. In between Type-3 and Type-4, however, there lies a spectrum of clones that, although still exhibiting some syntactic similarities, are extremely hard to detect – the Twilight Zone. Most clone detectors reported in the literature…
Figures and Tables from this paper
82 Citations
On Precision of Code Clone Detection Tools
- Computer Science2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER)
- 2019
This work shows that the reported precision of these tools leads to significantly different conclusions and insights about the tools when different types of clones are taken into account, and stresses, once again, the importance of reporting inter-rater agreement.
Clone detection through srcClone: A program slicing based approach
- Computer ScienceJ. Syst. Softw.
- 2022
srcClone: Detecting Code Clones via Decompositional Slicing
- Computer ScienceICPC
- 2020
This paper presents a scalable slicing-based approach for detecting code clones, including semantic clones, and determines code segment similarity based on their corresponding program slices by taking advantage of a lightweight, publicly available, and scalable program slicing approach.
Improving Clone Detection Precision Using Machine Learning Techniques
- Computer Science2019 10th International Workshop on Empirical Software Engineering in Practice (IWESEP)
- 2019
This paper proposes an approach for increasing the precision of code clone detection using machine learning techniques and finds that the decision tree clone filter is helpful for decreasing the number of false positive clone classes in iClones, a well-known code clone detector.
Clone Detection on Large Scala Codebases
- Computer Science2020 IEEE 14th International Workshop on Software Clones (IWSC)
- 2020
Large scale experimental research on the performance of two state-of-the-art code clone detection techniques, SourcererCC and AutoenCODE, on both open source projects and an industrial project written in the Scala language reveals that both algorithms perform differently on the industrial project.
Enhancing code clone detection using control flow graphs
- Computer ScienceInternational Journal of Electrical and Computer Engineering (IJECE)
- 2019
Experimental results demonstrate that using CFG features is a viable methodology in terms of the effectiveness of clone detection for both syntactic and semantic clones.
SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge
- Computer Science2020 IEEE 14th International Workshop on Software Clones (IWSC)
- 2020
This work proposed a methodology to generate a wide range of semantic clone benchmark(s) for different programming languages with minimal human validation, based on the knowledge provided by developers who participate in the crowd-sourced information website, Stack Overflow.
NIL: large-scale detection of large-variance clones
- Computer ScienceESEC/SIGSOFT FSE
- 2021
NIL is a token-based clone detector that efficiently identifies clone candidates using an N-gram representation of token sequences and an inverted index and verifies the clone candidates by measuring their similarity based on the longest common subsequence between their token sequences.
Semantic Code Clone Detection Via Event Embedding Tree and GAT Network
- Computer Science2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS)
- 2020
This work proposes a code clone detection method based on event embedding tree and Graph Attention Network that can calculate the functional similarity of two pieces of code, thereby identifying semantically similar code fragments.
Code Clone Detection: A Literature Review
- Computer Science
- 2018
A literature review for code detection especially from the perspective of source code representation is presented, and the key issues of code clone research are summarized from three aspects: scientific, practical and technical difficulties.
References
SHOWING 1-10 OF 60 REFERENCES
Semantic Clone Detection Using Machine Learning
- Computer Science2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)
- 2016
A machine learning framework to automatically detect clones in software, which is able to detect Types-3 and the most complicated kind of clones, Type-4 clones, is presented.
SourcererCC: Scaling Code Clone Detection to Big-Code
- Computer Science2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)
- 2016
This paper presents a token-based clone detector, SourcererCC, that can detect both exact and near-miss clones from large inter-project repositories using a standard workstation, and evaluates the scalability, execution time, recall and precision, and compares it to four publicly available and state-of-the-art tools.
CCLearner: A Deep Learning-Based Clone Detection Approach
- Computer Science2017 IEEE International Conference on Software Maintenance and Evolution (ICSME)
- 2017
CCLEARNER is presented, the first solely token-based clone detection approach leveraging deep learning, which extracts tokens from known method-level code clones and nonclones to train a classifier, and then uses the classifier to detect clones in a given codebase.
CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code
- Computer ScienceIEEE Trans. Software Eng.
- 2002
A new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison, is proposed, which has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems.
Comparison and Evaluation of Clone Detection Tools
- Computer ScienceIEEE Transactions on Software Engineering
- 2007
An experiment is presented that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC) and selects techniques that cover the whole spectrum of the state-of-the-art in clone detection.
Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code
- Computer ScienceIJCAI
- 2017
Experiments on software clone detection benchmarks indicate that the CDLH approach is effective and outperforms the state-of-the-art approaches in software functional clone detection.
Evaluating clone detection tools with BigCloneBench
- Computer Science2015 IEEE International Conference on Software Maintenance and Evolution (ICSME)
- 2015
BigCloneBench, a big data clone benchmark, is used to evaluate the recall of ten clone detection tools and it is found that the tools have strong recall for Type-1 and Type-2 clones, as well as Type-3 clones with high syntactical similarity.
Deep learning code fragments for code clone detection
- Computer Science2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE)
- 2016
This work introduces learning-based detection techniques where everything for representing terms and fragments in source code is mined from the repository, and compared its approach to a traditional structure-oriented technique and found that it detected clones that were either undetected or suboptimally reported by the prominent tool Deckard.
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach
- Computer ScienceSci. Comput. Program.
- 2009
Scalable detection of semantic clones
- Computer Science2008 ACM/IEEE 30th International Conference on Software Engineering
- 2008
This paper efficiently solve the tree similarity problem to create a scalable analysis that locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments.