Identifying Metrics' Biases When Measuring or Approximating Size in Heterogeneous Languages
Data from several projects show a significant relationship between the size of a module and its defect density. Here we address implications of this observation. Does the overall defect density of a software project vary with its module size distribution? Even more interesting is the questioncan we exploit this dependence to reduce the total number of defects? We examine the available data sets and propose a model relating module size and defect density. It takes into account defects that arise due to the interconnections among the modules as well as defects that occur due to the complexity of individual modules. Model parameters are estimated using actual data. We then present a key observation that allows use of this model for not just estimation the defect density, but also potentially optimizing a design to minimize defects. This observation, supported by several data sets examined, is that the module sizes often follow exponential distribution. We show how the two models used together provide a way of projecting defect density variation. We also consider the possibility of minimizing the defect density by controlling module size distribution.