TNM: A Tool for Mining of Socio-Technical Data from Git Repositories

  title={TNM: A Tool for Mining of Socio-Technical Data from Git Repositories},
  author={Nikolai Sviridov and Mikhail Evtikhiev and Vladimir Kovalenko},
  journal={2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)},
Networks of collaboration between engineers are reflected in traces of developers’ activity in version control systems (VCSs). Extracting data from Git repositories is an essential task for researchers and practitioners working on socio-technical analysis, but it requires substantial engineering work. With increasing interest in analysing socio-technical data and applying it in practice, there are no flexible and easily reusable tools to retrieve socio-technical information from VCSs. With no… 

Figures from this paper

GitDelver Enterprise Dataset (GDED): An Industrial Closed-source Dataset for Socio-Technical Research
This work mined 101 repositories and produced the GDED dataset containing socio-technical information about 106,216 commits, 470,940 file modifications and 3,471,556 method modifications from 164 developers during the last 13 years, using various programming languages.
ery-Analysis on Git-A ributes in Relational and Graph-DB
  • 2022


PyDriller: Python framework for mining software repositories
PyDriller is presented, a Python Framework that eases the process of mining Git, and is compared against the state-of-the-art Python Framework GitPython, demonstrating that PyDriller can achieve the same results with, on average, 50% less LOC and significantly lower complexity.
Perceval: Software Project Data at Your Will
Perceval is an industry strong free software tool that has been widely used in Bitergia, a company devoted to offer commercial software analytics of software projects, and hides the technical complexities related to data acquisition and eases the definition of analytics.
Assessing the bus factor of Git repositories
A tool that, given a Git-based repository, automatically measures the bus factor for any file, directory and branch in the repository and for the project itself and you can simulate with the tool what would happen to the project if one or more developers disappeared.
The promises and perils of mining git
This work focuses on git, a very popular DSCM used in high-profile projects and aims to help researchers interested in DSCMs avoid perils when mining and analyzing git data.
Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity
This paper argues that modularization, the traditional technique intended to reduce interdependencies among components of a system, has serious limitations in the context of software development and builds on the idea of congruence, proposed in prior work, to examine the relationship between the structure of technical and work dependencies.
Using Software Repositories to Investigate Socio-technical Congruence in Development Projects
It is shown how the information necessary to implement a quantitative measure of socio- technical congruence can be mined from commonly used software repositories, and how socio-technical congruency can be computed based on that information.
Revisiting the applicability of the pareto principle to core development teams in open source software projects
The findings suggest that the Pareto principle is not compatible with the core teams of many GitHub projects, and several of the studied GitHub projects are susceptible to the “bus factor” where the impact of a core developer leaving would be quite harmful.
CVS release history data for detecting logical couplings
The software evolution analysis approach enabled us to detect shortcomings of PACS such as architectural weaknesses, poorly designed inheritance hierarchies, or blurred interfaces of modules.
A degree-of-knowledge model to capture source code familiarity
It is shown that the degree-of-knowledge model can provide better results than an existing expertise finding approach and also case studies of the use of the model to support knowledge transfer and to identify changes of interest are reported.