Learn More
Over the last decade many techniques and tools for software clone detection have been proposed. In this paper, we provide a qualitative comparison and evaluation of the current state-of-the-art in clone detection techniques and tools, and organize the large amount of information into a coherent conceptual framework. We begin with background concepts, a(More)
Many tasks in software engineering can be characterized as source to source transformations. Design recovery, software restructuring, forward engineering, language translation, platform migration and code reuse can all be understood as transformations from one source text to another. TXL, the Tree Transformation Language, is a programming language and rapid(More)
This paper examines the effectiveness of a new language-specific parser-based but lightweight clone detection approach. Exploiting a novel application of a source transformation system, the method accurately finds near-miss clones using an efficient text line comparison technique. The transformation system assists the method in three ways. First, using(More)
TXL is a special-purpose programming language designed for creating, manipulating and rapidly prototyping language descriptions, tools and applications. TXL is designed to allow explicit programmer control over the interpretation, application , order and backtracking of both parsing and rewriting rules. Using first order functional programming at the higher(More)
The current version of Java (J2SE 5.0) provides a high level of support for concurreny in comparison to previous versions. For example, programmers using J2SE 5.0 can now achieve synchronization between concurrent threads using explicit locks, semaphores, barriers, latches, or ex-changers. Furthermore, built-in concurrent data structures such as hash maps(More)
This paper describes a rapid prototyping system for extensions to an existing programming language. Such extensions might include new language features or might introduce notation specific to a particular problem domain. The system consists of a dialect description language used to specify the syntax and semantics of extensions, and a context sensitive(More)
In this poster we present an automated method for empirically evaluating clone detection tools. Our method leverages mutation-based techniques to overcome existing limitations of tool evaluation studies by automatically synthesizing large numbers of known clones based on an editing theory of clone creation. Our framework is effective in measuring recall and(More)
The optimal number of latent topics required to model the most accurate latent substructure for a source code corpus is an open question in source code analysis. Most estimates about the number of latent topics that exist in a software corpus are based on the assumption that the data is similar to natural language, but there is little empirical evidence to(More)