An Experimental Approach to Detect Similar Web Pages Based on 3-Levels of Similarity Clues
Code clones often result in code inconsistencies, which eventually increase cost and degrade quality. Web applications have higher rate of clones than normal software and it is more and more necessary to detect clones in web applications. In this paper, three levels of views in detecting clone pairs are suggested for a web application. The proposed technique utilizes relationships between web pages, passed parameters, and target entities as similarity clues. The results of the experiments also represent the trade-off between recall rate and accuracy. And then, two approaches, static and dynamic selection, are suggested for deciding candidates of clone pairs. As a result, the combined strategy of three levels of methods and two approaches of candidate selection is recommended. Finally, applicability of the proposed approach is shown from the experiments.