• Corpus ID: 207794518

Improved Recognition of Security Bugs via Dual Hyperparameter Optimization

@article{Shu2019ImprovedRO,
  title={Improved Recognition of Security Bugs via Dual Hyperparameter Optimization},
  author={Rui Shu and Tianpei Xia and Jianfeng Chen and Laurie Ann Williams and Tim Menzies},
  journal={ArXiv},
  year={2019},
  volume={abs/1911.02476}
}
Background: Security bugs need to be handled by small groups of engineers before being widely discussed (otherwise the general public becomes vulnerable to hackers that exploit those bugs). But learning how to separate the security bugs from other bugs is challenging since they may occur very rarely. Data mining that can find such scarce targets required extensive tuning effort. Goal: The goal of this research is to aid practitioners as they struggle to tune methods that try to distinguish… 

Figures and Tables from this paper

LDA Categorization of Security Bug Reports in Chromium Projects
TLDR
This work studied the security bug reports of the Chromium project and looked into three main aspects of these bug reports, namely: frequencies of reporting them, how quickly they get fixed and is LDA effective in grouping these reports to known vulnerabilities types.
A Pragmatic Approach for Hyper-Parameter Tuning in Search-based Test Case Generation
TLDR
A new metric is proposed (“Tuning Gain”), which estimates how cost-effective tuning a particular class is, and a tuning approach called Meta-GA is used, which shows that for a low tuning budget, prioritizing classes outperforms the alternatives in terms of extra covered branches.
FRUGAL: Unlocking Semi-Supervised Learning for Software Analytics
TLDR
It is asserted that FRUGAL can save considerable effort in data labelling especially in validating prior work or researching new problems, and it is suggested that proponents of complex and expensive methods should always baseline such methods against simpler and cheaper alternatives.
FRUGAL: Unlocking Semi-Supervised Learning for Software Analytics
  • Huy Tu, T. Menzies
  • Computer Science
    2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE)
  • 2021
TLDR
It is asserted that FRUGAL can save considerable effort in data labelling especially in validating prior work or researching new problems, and suggested that proponents of complex and expensive methods should always baseline such methods against simpler and cheaper alternatives.
Mining Workflows for Anomalous Data Transfers
TLDR
X-FLASH is developed, a network anomaly detection tool for faulty TCP workflow transfers that incorporates novel hyperparameter tuning and data mining approaches for improving the performance of the machine learning algorithms to accurately classify the anomalous TCP packets.
Mining Scientific Workflows for Anomalous Data Transfers
TLDR
X-FLASH is developed, a network anomaly detection tool for faulty TCP workflow transfers that incorporates novel hyperparameter tuning and data mining approaches for improving the performance of the machine learning algorithms to accurately classify the anomalous TCP packets.
Bayesian Hyperparameter Optimization and Ensemble Learning for Machine Learning Models on Software Effort Estimation
TLDR
It can be seen that the RF method based on AdaBoost ensemble learning and bayesian optimization outperforms this approach and assigns a feature importance rating, which makes it a promising tool in software effort prediction.
FRUGAL: Unlocking Semi-Supervised Learning for Software Analytics
  • 2021

References

SHOWING 1-10 OF 81 REFERENCES
PerfLearner: Learning from Bug Reports to Understand and Generate Performance Test Frames
  • Xue Han, Tingting Yu, D. Lo
  • Computer Science
    2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)
  • 2018
TLDR
An automated approach is designed and evaluated, PerfLearner, to extract execution commands and input parameters from descriptions of performance bug reports and use them to generate test frames for guiding actual performance test case generation.
Identify Severity Bug Report with Distribution Imbalance by CR-SMOTE and ELM
TLDR
This study proposes an enhanced oversampling approach called CR-SMOTE to enhance the classification of bug reports with a realistically imbalanced severity distribution, and uses an extreme learning machine (ELM) — a feedforward neural network with a single layer of hidden nodes — to predict the bug severity.
High-Impact Bug Report Identification with Imbalanced Learning Strategies
TLDR
The results show that different variants have different performances, and the best performing variants SMOTE (synthetic minority over-sampling technique) + KNN (K-nearest neighbours) for surprise bug identification and RUS (random under-sampled) + NB (naive Bayes) for breakage bug identification outperform the F1-scores of the two state-of-the-art approaches by Thung et al. and Garcia and Shihab.
Automated Identification of High Impact Bug Reports Leveraging Imbalanced Learning Strategies
TLDR
The effectiveness of various imbalanced learning strategies built upon a number of well-known classification algorithms are investigated and under-sampling is the best im balanced learning strategy with naive Bayes multinominal for high impact bug identification.
Text Filtering and Ranking for Security Bug Report Prediction
TLDR
FARSEC, a framework for filtering and ranking bug reports for reducing the presence of security related keywords, is proposed and demonstrated that FARSEC improves the performance of text-based prediction models for security bug reports in 90 percent of cases.
R2Fix: Automatically Generating Bug Fixes from Bug Reports
TLDR
R2Fix combines past fix patterns, machine learning techniques, and semantic patch generation techniques to fix bugs automatically and could have shortened and saved up to an average of 63 days of bug diagnosis and patch generation time.
Ensemble Data Reduction Techniques and Multi-RSMOTE via Fuzzy Integral for Bug Report Classification
TLDR
This paper addresses the problem of low-quality and class imbalance for identifying the severity of bug reports by combining feature selection with instance selection to simultaneously reduce the bug report dimension and the word dimension, which could get small-scale and high-quality reduced data set.
Fusion of Multi-RSMOTE With Fuzzy Integral to Classify Bug Reports With an Imbalanced Distribution
TLDR
An improved synthetic minority oversampling technique is proposed to avoid the degraded performance caused by class imbalance in bug report datasets, and an ensemble algorithm based on Choquet fuzzy integral is employed to combine the wisdom of crowds and make better decisions.
Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction
TLDR
A simple but improved supervised model called CBS+, which leverages the idea of both EALR and LT is proposed, and the number of defective changes detected by CBS+ is comparable to LT’s result, while CBS+ significantly reduces context switches and initial false alarms before first success.
...
...