- Full text PDF available (9)
- This year (0)
- Last five years (0)
Data Set Used
We propose a practical defect prediction approach for companies that do not track defect related data. Specifically, we investigate the applicability of cross-company (CC) data for building localized defect predictors using static code features. Firstly, we analyze the conditions, where CC data can be used as is. These conditions turn out to be quite few.… (More)
Zhang and Zhang argue that predictors are useless unless they have high precison&recall. We have a different view, for two reasons. First, for SE data sets with large neg/pos ratios, it is often required to lower precision to achieve higher recall. Second, there are many domains where low precision detectors are useful.
Assessing software costs money and better assessment costs exponentially more money. Given finite budgets, assessment resources are typically skewed towards areas that are believed to be mission critical. This leaves blind spots: portions of the system that may contain defects which may be missed. Therefore, in addition to rigorously assessing mission… (More)
Within NASA, there is an increasing awareness that software is of growing importance to the success of missions. Much data has been collected, and many theories have been advanced on how to reduce or eliminate errors in code. However, learning requires experience. This article documents a new NASA initiative to build a centralized repository of software… (More)
Traditional methods of generating quality code indicators (e.g. linear regression, decision tree induction) can be demonstrated to be inappropriate for IV&V purposes. IV&V is a unique aspect of the software lifecycle, and different methods are necessary to produce quick and accurate results. If quality code indicators could be produced on a per-project… (More)
Software repositories plus defect logs are useful for learning defect detectors. Such defect detectors could be a useful resource allocation tool for software managers. One way to view our detectors is that they are a V&V tool for V&V; i.e. they can be used to assess if " too much " of the testing budget is going to " too little " of the system. Finding… (More)
—Numerous discrepancies exist between expert opinion and empirical data reported in Morisio et al.'s recent TSE article. The differences related to what factors encouraged successful reuse in software organizations. This note describes how those differences were detected and comments on their methodological implications.
There are many machine learning algorithms currently available. In the 21st century, the problem no longer lies in writing the learner, but in choosing which learners to run on a given data set. In this paper, we argue that the final choice of learners should not be exclusive; in fact, there are distinct advantages in running data sets through multiple… (More)