Justin S. Di Stefano

Learn More
We propose a practical defect prediction approach for companies that do not track defect related data. Specifically, we investigate the applicability of cross-company (CC) data for building localized defect predictors using static code features. Firstly, we analyze the conditions, where CC data can be used as is. These conditions turn out to be quite few.(More)
Zhang and Zhang argue that predictors are useless unless they have high precison&recall. We have a different view, for two reasons. First, for SE data sets with large neg/pos ratios, it is often required to lower precision to achieve higher recall. Second, there are many domains where low precision detectors are useful.
Within NASA, there is an increasing awareness that software is of growing importance to the success of missions. Much data has been collected, and many theories have been advanced on how to reduce or eliminate errors in code. However, learning requires experience. This article documents a new NASA initiative to build a centralized repository of software(More)
Software repositories plus defect logs are useful for learning defect detectors. Such defect detectors could be a useful resource allocation tool for software managers. One way to view our detectors is that they are a V&V tool for V&V; i.e. they can be used to assess if ”too much” of the testing budget is going to ”too little” of the system. Finding such(More)
Traditional methods of generating quality code indicators (e.g. linear regression, decision tree induction) can be demonstrated to be inappropriate for IV&V purposes. IV&V is a unique aspect of the software lifecycle, and different methods are necessary to produce quick and accurate results. If quality code indicators could be produced on a per-project(More)
There are many machine learning algorithms currently available. In the 21st century, the problem no longer lies in writing the learner, but in choosing which learners to run on a given data set. In this paper, we argue that the final choice of learners should not be exclusive; in fact, there are distinct advantages in running data sets through multiple(More)