An Empirical Investigation to Overcome Class-Imbalance in Inspection Reviews

  title={An Empirical Investigation to Overcome Class-Imbalance in Inspection Reviews},
  author={Maninder Singh and Gursimran Singh Walia and Anurag Goswami},
  journal={2017 International Conference on Machine Learning and Data Science (MLDS)},
Background: software inspection results in reviews that report the presence of faults. Requirements author must manually read through the reviews and differentiate between true-faults and false-positives. Problem: post-inspection decisions (fault or nonfault) are difficult and time consuming. It is difficult to employ machine learning (ML) techniques directly to raw (unstructured) data because of class imbalance problem and possible fault-slippage through misclassification of fault. Aim: The… Expand
Automated Validation of Requirement Reviews: A Machine Learning Approach
  • Maninder Singh
  • Computer Science
  • 2018 IEEE 26th International Requirements Engineering Conference (RE)
  • 2018
This research employs various classification approaches, NL processing with semantic analysis and mining solutions from graph theory to requirement reviews and NL requirements to automate the validation of inspection reviews and find common patterns that describe high-quality requirements. Expand
Using Supervised Learning to Guide the Selection of Software Inspectors in Industry
This study analyzes the reading patterns (RPs) of inspectors recorded by eye-tracking equipment and evaluates their abilities to find various fault-types, showing that the approach could guide the inspector selection with an accuracy ranging between 79.3% and 94% forVarious fault- types. Expand
Automating Key Phrase Extraction from Fault Logs to Support Post-Inspection Repair of Software Requirements
This research paper aims at developing an automated approach to identify fault prone requirements in a software requirement specification (SRS) document to mitigate the fault propagation to laterExpand
Using Semantic Analysis and Graph Mining Approaches to Support Software Fault Fixation
  • Maninder Singh, G. Walia
  • Computer Science
  • 2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)
  • 2020
The proposed approach in this paper employs NL processing, machine learning, semantic analysis, and graph mining approaches to generate a graph of inter-related requirements (IRR) based on semantic similarity score to accurately identify the highly similar requirements to support the CIA. Expand
A Vertical Breadth-First Multilevel Path Algorithm to Find All Paths in a Graph
A novel approach called vertical breadth-first tree that utilizes vertical data structures to find all-length paths (including shortest paths) for all pairs of vertices in a graph is presented. Expand


An Empirical Study on Improving Severity Prediction of Defect Reports Using Feature Selection
Whether feature selection can benefit the severity prediction task with three commonly used feature selection schemes, Information Gain, Chi-Square, and Correlation Coefficient, based on the Multinomial Naive Bayes classification approach is discussed. Expand
Combining text mining and data mining for bug report classification
A multi‐stage approach by combining both text mining and data mining techniques to automate the prediction process of bug reports, and empirically studied the impact relation between the underlying classifiers and various other properties of the combined model. Expand
Comprehensible software fault and effort prediction: A data mining approach
Surprisingly, the trees extracted from the black-box models by ALPA are not only comprehensible and explain how theblack-box model makes (most of) its predictions, but are also more accurate than the trees obtained by working directly on the data. Expand
Solving the class imbalance problems using RUSMultiBoost ensemble
This work proposes RUSMultiBoost, a hybrid method that is constituent of MultiBoost ensemble and random undersampling (RUS) to solve the class imbalance problem and shows that the hybrid ensemble method performs significantly better than other methods on benchmark data sets using G-mean, Sensitivity and F1-measure. Expand
Characteristics of Useful Code Reviews: An Empirical Study at Microsoft
The proportion of useful comments made by a reviewer increases dramatically in the first year that he or she is at Microsoft but tends to plateau afterwards, and it is found that the more files that are in a change, the lower the proportion of comments in the code review that will be of value to the author of the change. Expand
Evaluating the Use of Requirement Error Abstraction and Classification Method for Preventing Errors during Artifact Creation: A Feasibility Study
The hypothesis was that participants who find more errors during the inspection of a requirements document would make fewer errors when creating their own requirements document, and the overall result supports this hypothesis. Expand
Classification of defect types in requirements specifications: Literature review, proposal and assessment
Recommendations are given to industry and other researchers on the design of classification schemes and treatment of classification results, following rules to build defects taxonomies. Expand
Models for evaluating review effectiveness
Delivering a high quality reliable product is the main focus in any software development. The basic quality measure is the defects in the product. Defects found in the later phases of the productExpand
Analysis of user comments: An approach for software requirements evolution
This paper explores the rich set of user feedback available for third party mobile applications as a way to extract new/changed requirements for next versions by adapting information retrieval techniques including topic modeling and evaluating them on different publicly available data sets. Expand
A comparative study on sampling techniques for handling class imbalance in streaming data
  • Hien M. Nguyen, E. Cooper, K. Kamei
  • Computer Science
  • The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems
  • 2012
This study suggests that a multiple random under-sampling (MRUS) technique should be a good choice for applications with imbalanced and streaming data, because MRUS is the most effective while still keeping a high speed. Expand