A large-scale study of architectural evolution in open-source software systems
Architectural recovery techniques analyze a software system's implementation-level artifacts to suggest its likely architecture. However, different techniques will often suggest different architectures for the same system, making it difficult to interpret these results and determine the best technique without significant human intervention. Researchers have tried to assess the quality of recovery techniques by comparing their results with authoritative recoveries: meticulous, labor-intensive recoveries of existing well-known systems in which one or more engineers is integrally involved. However, these engineers are usually not a system's original architects or even developers. This carries the risk that the authoritative recoveries may miss domain-, application-, and system context-specific information. To deal with this problem, we propose a framework comprising a set of principles and a process for recovering a system's ground-truth architecture. The proposed recovery process ensures the accuracy of the obtained architecture by involving a given system's architect or engineer in a limited, but critical fashion. The application of our work has the potential to establish a set of "ground truths" for assessing existing and new architectural recovery techniques. We illustrate the framework on a case study involving Apache Hadoop.