Guilty or Not Guilty: Using Clone Metrics to Determine Open Source Licensing Violations

  title={Guilty or Not Guilty: Using Clone Metrics to Determine Open Source Licensing Violations},
  author={Akito Monden and Satoshi Okahara and Yuki Manabe and Ken-ichi Matsumoto},
  journal={IEEE Software},
To increase productivity, programmers often unwittingly violate open source software licenses by reusing code fragments, or clones. The authors explore metrics that can reveal the existence or absence of code reuse and apply these metrics to 1,225 open source product pairs. 
On the robustness of clone detection to code obfuscation
This paper presents a framework for semi-automated code obfuscations, and presents a case study to evaluate the robustness of selected clone detectors against such obfuscations.
Expose: Discovering Potential Binary Code Re-use
  • Beng Heng Ng, A. Prakash
  • Computer Science
    2013 IEEE 37th Annual Computer Software and Applications Conference
  • 2013
Expose is a tool that combines symbolic execution using a theorem prover, and function-level syntactic matching techniques to achieve both performance and high quality rankings of applications.
An Empirical Study of Fault Prediction with Code Clone Metrics
  • Yasutaka Kamei, Hiroki Sato, +5 authors N. Ubayashi
  • Computer Science
    2011 Joint Conference of the 21st International Workshop on Software Measurement and the 6th International Conference on Software Process and Product Measurement
  • 2011
The result suggested that clone metrics were effective in fault prediction for large modules but not for small modules, and the relationship between clone metrics and fault density was analyzed.
Research on the Model of Legacy Software Reuse Based on Code Clone Detection
  • Meng Fanqi, Kan Yunqi
  • Computer Science
    2013 5th International Conference on Computational Intelligence and Communication Networks
  • 2013
The test result shows that the reuse method can shrink the scope for searching the reusable component in legacy software systems, and thus improve the efficiency of legacy software reuse.
API trustworthiness: an ontological approach for software library adoption
A novel Ontological Trustworthiness Assessment Model (OntTAM) is introduced, which supports the automated analysis and assessment of quality attributes related to the trustworthiness of libraries and APIs in open-source systems and provides developers with additional insights into the potential impact of reused libraries and API on the quality and trustworthinessof their project.
Industry Questions about Open Source Software in Business: Research Directions and Potential Answers
This paper focuses on OSS-related FAQ in industry, and tries to answer questions or to provide research directions based on lessons learned from recent mining OSS repository researches.
Towards Least Privilege Principle: Limiting Unintended Accesses in Software Systems.
This document summarizes current capabilities, research and operational priorities, and plans for further studies that were established at the 2015 USGS workshop on quantitative hazard assessments of earthquake-triggered landsliding and liquefaction in the Czech Republic.
Design and Development of an Efficient Software Clone Detection Technique
The study reported the use of clone detection in finding commonalities in the form of domain concepts in source code which will help analysts in understanding the design of the system for better maintenance.
Software clone detection: A systematic review
An extensive systematic literature review of software clones in general and software clone detection in particular calls for an increased awareness of the potential benefits of software clone management, and identifies the need to develop semantic and model clone detection techniques.
CodeScoping: A Source Code Based Tool to Software Product Lines Scoping
This paper proposes an approach to support the scoping process based on the existing products source code to reduce costs and time in the scoped process.


Comparison and Evaluation of Clone Detection Tools
An experiment is presented that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC) and selects techniques that cover the whole spectrum of the state-of-the-art in clone detection.
CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code
A new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison, is proposed, which has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems.
Identifying similar code with program dependence graphs
  • J. Krinke
  • Computer Science
    Proceedings Eighth Working Conference on Reverse Engineering
  • 2001
This approach to identifying similar code in programs based on finding similar subgraphs in attributed directed graphs considers not only the syntactic structure of programs but also the data flow within (as an abstraction of the semantics).
On finding duplication and near-duplication in large software systems
  • B. Baker
  • Computer Science
    Proceedings of 2nd Working Conference on Reverse Engineering
  • 1995
A program called dup can be used to locate instances of duplication or near-duplication in a software system and is shown to be both effective at locating duplication and fast.
Experiment on the automatic detection of function clones in a software system using metrics
A technique to automatically identify duplicate and near duplicate functions in a large software system using metrics extracted from the source code using the tool Datrix/sup TM/.
Java Birthmarks - Detecting the Software Theft -
This work proposes four types of birthmarks for Java class files, which are unique and native characteristics of every class file, and demonstrates the effectiveness of these birthmarks against automatic program transformation and compiler-specific issues.
Clone detection using abstract syntax trees
The paper presents simple and practical methods for detecting exact and near miss clones over arbitrary program fragments in program source code by using abstract syntax trees and suggests that clone detection could be useful in producing more structured code, and in reverse engineering to discover domain concepts and their implementations.