Software Architecture Recovery through Similarity-Based Graph Clustering

  title={Software Architecture Recovery through Similarity-Based Graph Clustering},
  author={Jianlin Zhu and Jin Huang and Daicui Zhou and Zhongbao Yin and Guoping Zhang and Qiang He},
  journal={Int. J. Softw. Eng. Knowl. Eng.},
Software architecture recovery is to gain the architectural level understanding of a software system while its architecture description does not exist. In recent years, researchers have adopted various software clustering techniques to detect hierarchical structure of software systems. Most graph clustering techniques focus on the connectivity between program elements, but unreasonably ignore the similarity which is also a key measure for finding elements of one module. In this paper we propose… 

Reconstructing and evolving software architectures using a coordinated clustering framework

A framework that assists software engineers in recovering a software project’s architecture from its source code using a novel compartmentalization technique Coordinated Clustering of Heterogeneous Datasets (CCHD) that relies on contextual and structural information in the code base, but does not require specific weights for each information type, which allows it to adapt to different project types and domains.

Identifying composite crosscutting concerns through semi‐supervised learning

A semi‐supervised graph clustering approach named constrained authority‐shift clustering to identify composite CCs is proposed and evaluated on numerous software systems including large‐scale distributed software system.

Developer Role Evolution in Open Source Software Ecosystem: An Explanatory Study on GNOME

A case study on the GNOME ecosystem showed that the total number of projects that a developer joined and part of subjective willingness and project environment significantly influenced the developers’ chance to evolve into core developers and project leaders.

Graph mining for forensic databases

The findings of this study provide motivational case arguments to support law enforcement and the judicial process with regards to intelligence-led policing and prosecution by the courts.

Improving Similarity Measure for Java Programs Based on Optimal Matching of Control Flow Graphs

Experiments demonstrate that the CFG-Match approach outperforms the comparative approaches in the detection of Java program plagiarism and is more accurate and robust against semantics-preserving transformations.



Comparison of Graph Clustering Algorithms for Recovering Software Architecture Module Views

An empirical study that evaluates four clustering algorithms according to three previously proposed criteria: extremity of cluster distribution, authoritativeness, and stability, which were measured against consecutive releases of four different systems suggest that the k-means algorithm performs best in terms of author itativeness and extremity and the modularization quality algorithm produces more stable clusters.

Hierarchical Clustering for Software Architecture Recovery

This paper provides a detailed analysis of the behavior of various similarity and distance measures that may be employed for software clustering, and analyzes the clustering process of various well-known clustering algorithms by using multiple criteria.

Graph Clustering Based on Structural/Attribute Similarities

This paper proposes a novel graph clustering algorithm, SA-Cluster, based on both structural and attribute similarities through a unified distance measure, which partitions a large graph associated with attributes into k clusters so that each cluster contains a densely connected subgraph with homogeneous attribute values.

Comparing the decompositions produced by software clustering algorithms using similarity measurements

  • B. MitchellS. Mancoridis
  • Computer Science
    Proceedings IEEE International Conference on Software Maintenance. ICSM 2001
  • 2001
It is argued that better similarity measurements can be designed if the relations between the components are considered, and two similarity measurements are proposed that overcome certain problems in existing measurements.

Software components capture using graph clustering

A simple, fast computing and easy to implement method for finding relatively good clusterings of software systems by applying a straightforward metric, MQ, defined in terms of the neighborhoods of its end vertices to identify the weak edges of the graph.

Experiments with clustering as a software remodularization method

This work confirms the importance of a proper description scheme of the entities being clustered, lists a few good coupling metrics to use and characterize the quality of different clustering algorithms, and proposes novel description schemes not directly based on the source code.

Multiple layer clustering of large software systems

Comparison with existing software clustering algorithms indicates that MULICsoft is able to produce decompositions that are close to those created by system experts.

Scalable graph clustering using stochastic flows: applications to community discovery

A multi-level algorithm for graph clustering using flows that delivers significant improvements in both quality and speed when compared to state-of-the-art algorithms.

Bunch: a clustering tool for the recovery and maintenance of software system structures

  • S. MancoridisB. MitchellY. ChenE. Gansner
  • Computer Science
    Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360)
  • 1999
A clustering tool called Bunch is developed that creates a system decomposition automatically by treating clustering as an optimization problem and a feature that enables the integration of designer knowledge about the system structure into an otherwise fully automatic clustering process is described.