Data Quality: Some Comments on the NASA Software Defect Datasets

  title={Data Quality: Some Comments on the NASA Software Defect Datasets},
  author={Martin J. Shepperd and Qinbao Song and Zhongbin Sun and Carolyn Mair},
  journal={IEEE Transactions on Software Engineering},
Background--Self-evidently empirical analyses rely upon the quality of their data. Likewise, replications rely upon accurate reporting and using the same rather than similar versions of datasets. In recent years, there has been much interest in using machine learners to classify software modules into defect-prone and not defect-prone categories. The publicly available NASA datasets have been extensively used as part of this research. Objective--This short note investigates the extent to which… CONTINUE READING
Highly Influential
This paper has highly influenced 17 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 228 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.


Publications citing this paper.
Showing 1-10 of 110 extracted citations

Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models

2015 IEEE/ACM 37th IEEE International Conference on Software Engineering • 2015
View 5 Excerpts
Highly Influenced

How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction

ACM Trans. Softw. Eng. Methodol. • 2018
View 8 Excerpts
Highly Influenced

228 Citations

Citations per Year
Semantic Scholar estimates that this publication has 228 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 20 references

The Scientific Method in Practice: Reproducibility in the Computational Sciences

V. Stodden
MIT Sloan School Working Paper 4773-10, 2010., Aug. 2012. 1214 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 9, SEPTEMBER 2013 • 2010
View 1 Excerpt

Similar Papers

Loading similar papers…