Efficiently Measuring an Accurate and Generalized Clone Detection Precision using Clone Clustering

@inproceedings{Svajlenko2016EfficientlyMA,
  title={Efficiently Measuring an Accurate and Generalized Clone Detection Precision using Clone Clustering},
  author={Jeffrey Svajlenko and C. Roy},
  booktitle={SEKE},
  year={2016}
}
An important measure of clone detection perfor- mance is precision. However, there has been a marked lack of research into methods of efficiently and accurately measuring the precision of a clone detection tool. Instead, tool authors simply validate a small random sample of the clones their tools detected in a subject software system. Since there could be many thousands of clones reported by the tool, such a small random sample cannot guarantee an accurate and generalized measure of the tool's… Expand
Benchmarks for software clone detection: A ten-year retrospective
  • C. Roy, J. Cordy
  • Computer Science
  • 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)
  • 2018
TLDR
A history and overview of software clone detection benchmarks, and the steps of ourselves and others to come to this stage are presented, to encourage researchers to both use existing benchmarks and contribute to building the benchmarks of the future. Expand
A conceptual framework for clone detection using machine learning
TLDR
This paper uses the generated descriptions for two code snippets as a metric to measure the similarities between them and proposes a vector similarity measure to calculate a similarity indicator between these measures which can decide which code snippets are clones. Expand

References

SHOWING 1-10 OF 26 REFERENCES
Evaluating Modern Clone Detection Tools
  • Jeffrey Svajlenko, C. Roy
  • Computer Science
  • 2014 IEEE International Conference on Software Maintenance and Evolution
  • 2014
TLDR
Evaluating the recall of eleven modern clone detection tools using four benchmark frameworks concludes that Bellon's Framework may not be accurate for modern tools, and that an update of its corpus with clones detected by the modern tools is warranted. Expand
A Mutation/Injection-Based Automatic Framework for Evaluating Code Clone Detection Tools
  • C. Roy, J. Cordy
  • Computer Science
  • 2009 International Conference on Software Testing, Verification, and Validation Workshops
  • 2009
TLDR
An automated method for empirically evaluating clone detection tools that leverages mutation-based techniques to overcome limitations by automatically synthesizing large numbers of known clones based on an editing theory of clone creation is proposed. Expand
Evaluating clone detection tools with BigCloneBench
  • Jeffrey Svajlenko, C. Roy
  • Computer Science
  • 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME)
  • 2015
TLDR
BigCloneBench, a big data clone benchmark, is used to evaluate the recall of ten clone detection tools and it is found that the tools have strong recall for Type-1 and Type-2 clones, as well as Type-3 clones with high syntactical similarity. Expand
Towards a Big Data Curated Benchmark of Inter-project Code Clones
TLDR
A Big Data clone detection benchmark that consists of known true and false positive clones in a Big Data inter-project Java repository and it is shown how the benchmark can be used to measure the recall and precision of clone detection techniques. Expand
SourcererCC: Scaling Code Clone Detection to Big-Code
TLDR
This paper presents a token-based clone detector, SourcererCC, that can detect both exact and near-miss clones from large inter-project repositories using a standard workstation, and evaluates the scalability, execution time, recall and precision, and compares it to four publicly available and state-of-the-art tools. Expand
Empirical evaluation of clone detection using syntax suffix trees
TLDR
This paper describes how to make use of suffix trees to find syntactic clones in abstract syntax trees and reports the results of a large case study in which it empirically compare the new technique to other techniques using the Bellon benchmark for clone detectors. Expand
Comparison and Evaluation of Clone Detection Tools
TLDR
An experiment is presented that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC) and selects techniques that cover the whole spectrum of the state-of-the-art in clone detection. Expand
A Survey on Software Clone Detection Research
TLDR
The state of the art in clone detection research is surveyed, the clone terms commonly used in the literature are described along with their corresponding mappings to the commonly used clone types and several open problems related to clone detectionResearch are pointed out. Expand
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones
TLDR
This paper presents an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code and implemented this algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Expand
NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization
  • C. Roy, J. Cordy
  • Computer Science
  • 2008 16th IEEE International Conference on Program Comprehension
  • 2008
TLDR
A new language- specific parser-based but lightweight clone detection approach exploiting a novel application of a source transformation system that is capable of finding near-miss clones with high precision and recall, and with reasonable performance. Expand
...
1
2
3
...