Defect prediction from static code features: current results, limitations, new approaches

Abstract

Building quality software is expensive and software quality assurance (QA) budgets are limited. Data miners can learn defect predictors from static code features which can be used to control QA resources; e.g. to focus on the parts of the code predicted to be more defective. Recent results show that better data mining technology is not leading to better defect predictors. We hypothesize that we have reached the limits of the standard learning goal of maximizing area under the curve (AUC) of the probability of false alarms and probability of detection “AUC(pd, pf)”; i.e. the area under the curve of a probability of false alarm versus probability of detection. Accordingly, we explore changing the standard goal. Learners that maximize “AUC(effort, pd)” find the smallest set of modules that contain the most errors. WHICH is a meta-learner framework that can be quickly customized to different goals. When customized to AUC(effort, pd), WHICH out-performs all the data mining methods studied here. More importantly, measured in terms of this new goal, certain widely used learners perform much worse than simple manual methods. Hence, we advise against the indiscriminate use of learners. Learners must be chosen and customized to the goal at hand. With the right architecture (e.g. WHICH), tuning a learner to specific local business goals can be a simple task.

DOI: 10.1007/s10515-010-0069-5

Extracted Key Phrases

15 Figures and Tables

0204020102011201220132014201520162017
Citations per Year

144 Citations

Semantic Scholar estimates that this publication has 144 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Menzies2010DefectPF, title={Defect prediction from static code features: current results, limitations, new approaches}, author={Tim Menzies and Zach Milton and Burak Turhan and Bojan Cukic and Yue Jiang and Ayse Basar Bener}, journal={Automated Software Engineering}, year={2010}, volume={17}, pages={375-407} }