Oreo: detection of clones in the twilight zone

@article{Saini2018OreoDO,
  title={Oreo: detection of clones in the twilight zone},
  author={Vaibhav Saini and Farima Farmahinifarahani and Yadong Lu and Pierre Baldi and Cristina V. Lopes},
  journal={Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
  year={2018}
}
  • V. Saini, Farima Farmahinifarahani, C. Lopes
  • Published 15 June 2018
  • Computer Science
  • Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
Source code clones are categorized into four types of increasing difficulty of detection, ranging from purely textual (Type-1) to purely semantic (Type-4). Most clone detectors reported in the literature work well up to Type-3, which accounts for syntactic differences. In between Type-3 and Type-4, however, there lies a spectrum of clones that, although still exhibiting some syntactic similarities, are extremely hard to detect – the Twilight Zone. Most clone detectors reported in the literature… 

Figures and Tables from this paper

On Precision of Code Clone Detection Tools
TLDR
This work shows that the reported precision of these tools leads to significantly different conclusions and insights about the tools when different types of clones are taken into account, and stresses, once again, the importance of reporting inter-rater agreement.
srcClone: Detecting Code Clones via Decompositional Slicing
TLDR
This paper presents a scalable slicing-based approach for detecting code clones, including semantic clones, and determines code segment similarity based on their corresponding program slices by taking advantage of a lightweight, publicly available, and scalable program slicing approach.
Improving Clone Detection Precision Using Machine Learning Techniques
TLDR
This paper proposes an approach for increasing the precision of code clone detection using machine learning techniques and finds that the decision tree clone filter is helpful for decreasing the number of false positive clone classes in iClones, a well-known code clone detector.
Clone Detection on Large Scala Codebases
TLDR
Large scale experimental research on the performance of two state-of-the-art code clone detection techniques, SourcererCC and AutoenCODE, on both open source projects and an industrial project written in the Scala language reveals that both algorithms perform differently on the industrial project.
Enhancing code clone detection using control flow graphs
  • Dong Kwan Kim
  • Computer Science
    International Journal of Electrical and Computer Engineering (IJECE)
  • 2019
TLDR
Experimental results demonstrate that using CFG features is a viable methodology in terms of the effectiveness of clone detection for both syntactic and semantic clones.
SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge
TLDR
This work proposed a methodology to generate a wide range of semantic clone benchmark(s) for different programming languages with minimal human validation, based on the knowledge provided by developers who participate in the crowd-sourced information website, Stack Overflow.
NIL: large-scale detection of large-variance clones
TLDR
NIL is a token-based clone detector that efficiently identifies clone candidates using an N-gram representation of token sequences and an inverted index and verifies the clone candidates by measuring their similarity based on the longest common subsequence between their token sequences.
Semantic Code Clone Detection Via Event Embedding Tree and GAT Network
TLDR
This work proposes a code clone detection method based on event embedding tree and Graph Attention Network that can calculate the functional similarity of two pieces of code, thereby identifying semantically similar code fragments.
Code Clone Detection: A Literature Review
TLDR
A literature review for code detection especially from the perspective of source code representation is presented, and the key issues of code clone research are summarized from three aspects: scientific, practical and technical difficulties.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 60 REFERENCES
Semantic Clone Detection Using Machine Learning
TLDR
A machine learning framework to automatically detect clones in software, which is able to detect Types-3 and the most complicated kind of clones, Type-4 clones, is presented.
SourcererCC: Scaling Code Clone Detection to Big-Code
TLDR
This paper presents a token-based clone detector, SourcererCC, that can detect both exact and near-miss clones from large inter-project repositories using a standard workstation, and evaluates the scalability, execution time, recall and precision, and compares it to four publicly available and state-of-the-art tools.
CCLearner: A Deep Learning-Based Clone Detection Approach
TLDR
CCLEARNER is presented, the first solely token-based clone detection approach leveraging deep learning, which extracts tokens from known method-level code clones and nonclones to train a classifier, and then uses the classifier to detect clones in a given codebase.
CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code
TLDR
A new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison, is proposed, which has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems.
Comparison and Evaluation of Clone Detection Tools
TLDR
An experiment is presented that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC) and selects techniques that cover the whole spectrum of the state-of-the-art in clone detection.
Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code
TLDR
Experiments on software clone detection benchmarks indicate that the CDLH approach is effective and outperforms the state-of-the-art approaches in software functional clone detection.
Evaluating clone detection tools with BigCloneBench
  • Jeffrey Svajlenko, C. Roy
  • Computer Science
    2015 IEEE International Conference on Software Maintenance and Evolution (ICSME)
  • 2015
TLDR
BigCloneBench, a big data clone benchmark, is used to evaluate the recall of ten clone detection tools and it is found that the tools have strong recall for Type-1 and Type-2 clones, as well as Type-3 clones with high syntactical similarity.
Deep learning code fragments for code clone detection
TLDR
This work introduces learning-based detection techniques where everything for representing terms and fragments in source code is mined from the repository, and compared its approach to a traditional structure-oriented technique and found that it detected clones that were either undetected or suboptimally reported by the prominent tool Deckard.
Scalable detection of semantic clones
TLDR
This paper efficiently solve the tree similarity problem to create a scalable analysis that locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments.
...
1
2
3
4
5
...