Learn More
Over the last decade many techniques and tools for software clone detection have been proposed. In this paper, we provide a qualitative comparison and evaluation of the current state-of-the-art in clone detection techniques and tools, and organize the large amount of information into a coherent conceptual framework. We begin with background concepts, a(More)
Many tasks in software engineering can be characterized as source to source transformations. Design recovery, software restructuring, forward engineering, language translation, platform migration and code reuse can all be understood as transformations from one source text to another. TXL, the Tree Transformation Language, is a programming language and rapid(More)
This paper describes a rapid prototyping system for extensions to an existing programming language. Such extensions might include new language features or might introduce notation specific to a particular problem domain. The system consists of a dialect description language used to specify the syntax and semantics of extensions, and a context sensitive(More)
This paper examines the effectiveness of a new language-specific parser-based but lightweight clone detection approach. Exploiting a novel application of a source transformation system, the method accurately finds near-miss clones using an efficient text line comparison technique. The transformation system assists the method in three ways. First, using(More)
TXL is a special-purpose programming language designed for creating, manipulating and rapidly prototyping language descriptions, tools and applications. TXL is designed to allow explicit programmer control over the interpretation, application , order and backtracking of both parsing and rewriting rules. Using first order functional programming at the higher(More)
In this poster we present an automated method for empirically evaluating clone detection tools. Our method leverages mutation-based techniques to overcome existing limitations of tool evaluation studies by automatically synthesizing large numbers of known clones based on an editing theory of clone creation. Our framework is effective in measuring recall and(More)
The optimal number of latent topics required to model the most accurate latent substructure for a source code corpus is an open question in source code analysis. Most estimates about the number of latent topics that exist in a software corpus are based on the assumption that the data is similar to natural language, but there is little empirical evidence to(More)
—The NiCad Clone Detector is a scalable, flexible clone detection tool designed to implement the NiCad (Automated Detection of Near-Miss Intentional Clones) hybrid clone detection method in a convenient, easy-to-use command-line tool that can easily be embedded in IDEs and other environments. It takes as input a source directory or directories to be checked(More)