Ecosystems in GitHub and a Method for Ecosystem Identification Using Reference Coupling
Reverse engineering is an active area of research concerned with discovering techniques and providing tools that support the understanding of software systems. All the techniques that were proposed until now study individual systems in isolation. However, software systems are seldom developed in isolation. Instead, many times, they are developed together with other projects in the wider context of an organization or a community. We call the collection of projects that are developed in such a context a software ecosystem. Often, a software ecosystem and the knowledge associated with it is the most valuable asset of its owner. Sometimes the ecosystem can be the very reason for the existence of the organization. In this thesis we show that software ecosystems are an interesting and challenging subject of study, and that reverse engineering techniques can be used beyond the level of individual systems in the process of understanding software ecosystems. Our main contributions are threefold: we introduce a methodology for reverse engineering software ecosystems, we provide tools that support the methodology, and we validate the methodology on multiple case studies. Our methodology is based on analyzing the source code and the information in the versioning system repositories of the projects in an ecosystem and generating visual representations of the results. These visual representations present the ecosystem from several complementary perspectives. Given the large amount of information in an ecosystem, we provide exploration mechanisms that allow one to navigate the wealth of information available about the ecosystem. We distinguish between two dimensions of ecosystem exploration: horizontal exploration allows one to navigate between different views of a given ecosystem, while vertical exploration allows one to dive into the details of individual projects in the ecosystem. Supporting horizontal exploration is a matter of linking the various ecosystem perspectives in the tool. Supporting vertical exploration implies connecting the ecosystem level model to the detailed models of the component projects and performing architecture recovery on those models. Since architecture recovery cannot be fully automated, in our work we introduce two techniques that ease the generation of intra-project architectural views. The first technique regards automating the exploration based on the classification modules in a set of structural patterns. The second technique regards automating the filtering of dependencies in the architectural views based on the classification of the inter-module dependencies based on their evolution. To validate our contributions we applied our tools and techniques on a set of ecosystem case studies that belong to various organizations: two academic institutions, one industrial software house, and one open-source community. We validated the techniques that work at the architectural level on several well-known open source software systems.