Better cross company defect prediction


How can we find data for quality prediction? Early in the life cycle, projects may lack the data needed to build such predictors. Prior work assumed that relevant training data was found nearest to the local project. But is this the best approach? This paper introduces the Peters filter which is based on the following conjecture: When local data is scarce, more information exists in other projects. Accordingly, this filter selects training data via the structure of other projects. To assess the performance of the Peters filter, we compare it with two other approaches for quality prediction. Within-company learning and cross-company learning with the Burak filter (the state-of-the-art relevancy filter). This paper finds that: 1) within-company predictors are weak for small data-sets; 2) the Peters filter+cross-company builds better predictors than both within-company and the Burak filter+cross-company; and 3) the Peters filter builds 64% more useful predictors than both within-company and the Burak filter+cross-company approaches. Hence, we recommend the Peters filter for cross-company learning.

Extracted Key Phrases

7 Figures and Tables

Citations per Year

54 Citations

Semantic Scholar estimates that this publication has 54 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Peters2013BetterCC, title={Better cross company defect prediction}, author={Fayola Peters and Tim Menzies and Andrian Marcus}, journal={2013 10th Working Conference on Mining Software Repositories (MSR)}, year={2013}, pages={409-418} }