Towards Semantic Clone Detection via Probabilistic Software Modeling

  title={Towards Semantic Clone Detection via Probabilistic Software Modeling},
  author={Hannes Thaller and Lukas Linsbauer and Alexander Egyed},
  journal={2020 IEEE 14th International Workshop on Software Clones (IWSC)},
Semantic clones are program components with similar behavior, but different textual representation. Semantic similarity is hard to detect, and semantic clone detection is still an open issue. We present semantic clone detection via Probabilistic Software Modeling (PSM) as a robust method for detecting semantically equivalent methods. PSM inspects the structure and runtime behavior of a program and synthesizes a network of Probabilistic Models (PMs). Each PM in the network represents a method in… Expand
Semantic Clone Detection via Probabilistic Software Modeling
This work contributes a semantic clone detection approach that detects clones with 0% syntactic similarity, and presents SCD-PSM as a stable and precise solution to semantic clone Detection via Probabilistic Software Modeling. Expand


Scalable detection of semantic clones
This paper efficiently solve the tree similarity problem to create a scalable analysis that locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments. Expand
Comparison and Evaluation of Clone Detection Tools
An experiment is presented that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC) and selects techniques that cover the whole spectrum of the state-of-the-art in clone detection. Expand
Challenges of the Dynamic Detection of Functionally Similar Code Fragments
A dynamic detection approach that applies random testing to selected chunks of code similar to Jiang&Su's approach found that such an approach faces several limitations in its application to diverse Java systems. Expand
Oreo: detection of clones in the twilight zone
Ore is presented, a novel approach to source code clone detection that not only detects Type-1 to Type-3 clones accurately, but is also capable of detecting harder-to-detect clones in the Twilight Zone. Expand
Probabilistic Software Modeling: A Data-driven Paradigm for Software Analysis
Probabilistic Software Modeling is presented, a data-driven modeling paradigm for predictive and generative methods in software engineering that analyzes a program and synthesizes a network of probabilistic models that can simulate and quantify the original program's behavior. Expand
On Precision of Code Clone Detection Tools
This work shows that the reported precision of these tools leads to significantly different conclusions and insights about the tools when different types of clones are taken into account, and stresses, once again, the importance of reporting inter-rater agreement. Expand
Using Slicing to Identify Duplication in Source Code
The design and initial implementation of a tool that finds clones and displays them to the programmer and uses program dependence graphs (PDGs) and program slicing to find isomorphic PDG subgraphs that represent clones is described. Expand
Automatic mining of functionally equivalent code fragments via random testing
The results show that there exist many functionally equivalent code fragments that are syntactically different (i.e., they are unlikely due to copying and pasting code) and the algorithm was able to analyze the Linux kernel with several days of parallel processing. Expand
Survey of Research on Software Clones
  • R. Koschke
  • Biology, Computer Science
  • Duplication, Redundancy, and Similarity in Software
  • 2006
This report summarizes the notion of software redundancy, cloning, duplication, and similarity, which describes various categorizations of clone types, empirical studies on the root causes for cloning, current opinions and wisdom of consequences of cloning, empirical Studies on the evolution of clones, ways to remove, to avoid, and to detect them, empirical evaluations of existing automatic clone detector performance. Expand
Evaluating clone detection tools with BigCloneBench
  • Jeffrey Svajlenko, C. Roy
  • Computer Science
  • 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME)
  • 2015
BigCloneBench, a big data clone benchmark, is used to evaluate the recall of ten clone detection tools and it is found that the tools have strong recall for Type-1 and Type-2 clones, as well as Type-3 clones with high syntactical similarity. Expand