Data Set Used
The value of using static code attributes to learn defect predictors has been widely debated. Prior work has explored issues like the merits of "McCabes versus Halstead versus lines of code counts" for generating defect predictors. We show here that such debates are irrelevant since how the attributes are used to build predictors is much more important than… (More)
We propose a practical defect prediction approach for companies that do not track defect related data. Specifically, we investigate the applicability of cross-company (CC) data for building localized defect predictors using static code features. Firstly, we analyze the conditions, where CC data can be used as is. These conditions turn out to be quite few.… (More)
In mission critical systems, such as those developed by NASA, it is very important that the test engineers properly recognize the severity of each issue they identify during testing. Proper severity assessment is essential for appropriate resource allocation and planning for fixing activities and additional testing. Severity assessment is strongly… (More)
Software design is a process of trading off competing objectives. If the user objective space is rich, then we should use optimizers that can fully exploit that richness. For example, this study configures software product lines (expressed as feature maps) using various search-based software engineering methods. As we increase the number of optimization… (More)
Building quality software is expensive and software quality assurance (QA) budgets are limited. Data miners can learn defect predictors from static code features which can be used to control QA resources; e.g. to focus on the parts of the code predicted to be more defective. Recent results show that better data mining technology is not leading to better… (More)
Zhang and Zhang argue that predictors are useless unless they have high precison&recall. We have a different view, for two reasons. First, for SE data sets with large neg/pos ratios, it is often required to lower precision to achieve higher recall. Second, there are many domains where low precision detectors are useful.
Background: There are too many design options for software effort estimators. How can we best explore them all? Aim: We seek aspects on general principles of effort estimation that can guide the design of effort estimators. Method: We identified the essential assumption of analogy-based effort estimation, i.e., the immediate neighbors of a project offer… (More)
Data miners can infer rules showing how to improve either (a) the effort estimates of a project or (b) the defect predictions of a software module. Such studies often exhibit conclusion instability regarding what is the most effective action for different projects or modules.
Effort estimation often requires generalizing from a small number of historical projects. Generalization from such limited experience is an inherently underconstrained problem. Hence, the learned effort models can exhibit large deviations that prevent standard statistical methods (e.g., t-tests) from distinguishing the performance of alternative… (More)
Context: There are many methods that input static code features and output a predictor for faulty code modules. These data mining methods have hit a "performance ceiling"; i.e., some inherent upper bound on the amount of information offered by, say, static code features when identifying modules which contain faults. Objective: We seek an explanation for… (More)