Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue
@article{Shu2022DazzleUO, title={Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue}, author={Rui Shu and Tianpei Xia and Laurie Williams and Tim Menzies}, journal={2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR)}, year={2022}, pages={144-155} }
Background: Machine learning techniques have been widely used and demonstrate promising performance in many software security tasks such as software vulnerability prediction. However, the class ratio within software vulnerability datasets is often highly imbalanced (since the percentage of observed vulnerability is usually very low). Goal: To help security practitioners address software security data class imbalanced issues and further help build better prediction models with resampled datasets…Â
Figures and Tables from this paper
2 Citations
A Novelty Adversarial Loss for Classifying Unbalanced Anomaly Images
- Computer Science2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)
- 2022
This paper increases an encoder loss to obtain a discriminative margin for abnormal samples based on existing adversarial loss to achieve an efficient classification score with a higher area under the curve (AUC) results compared to the existing common methods on the above two datasets.
On the effectiveness of data balancing techniques in the context of ML-based test case prioritization
- Computer SciencePROMISE
- 2022
An empirical study on applying 19 state-of- the- art data balancing techniques for dealing with imbalanced data sets in the TCP context, based on the most comprehensive publicly available datasets demonstrates thatData balancing techniques can improve the effectiveness of the best-known ML-based TCP technique for most subjects, with an average of 0.06.
References
SHOWING 1-10 OF 66 REFERENCES
Generative Adversarial Networks for Black-Box API Attacks with Limited Training Data
- Computer Science2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)
- 2018
Stealth attacks with small footprint (using a small number of API calls) make adversarial machine learning practical under the realistic case with limited training data available to the adversary.
Challenging Machine Learning Algorithms in Predicting Vulnerable JavaScript Functions
- Computer Science2019 IEEE/ACM 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE)
- 2019
This paper applies 8 machine learning algorithms to build prediction models using a new dataset constructed for this research from the vulnerability information in public databases of the Node Security Project and the Snyk platform, and code fixing patches from GitHub to find the best performing models.
Using Generative Adversarial Networks for Data Augmentation in Android Malware Detection
- Computer Science2021 IEEE Conference on Dependable and Secure Computing (DSC)
- 2021
Experiments show that both traditional techniques and GAN can improve the accuracy of classification, but GAN could more effectively improve the classification model to detect that the dataset originally has a small number of datasets and the recognition accuracy is lower.
An Empirical Study on Unsupervised Network Anomaly Detection using Generative Adversarial Networks
- Computer ScienceProceedings of the 1st ACM Workshop on Security and Privacy on Artificial Intelligence
- 2020
An empirical study on the capability of GANs in network anomaly detection, which adopts two existing GAN models and develops new neural networks for their components, i.e., generator and discriminator.
MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction
- Computer ScienceIEEE Transactions on Software Engineering
- 2018
MAHAKIL is introduced, a novel and efficient synthetic oversampling approach for software defect datasets that is based on the chromosomal theory of inheritance that interprets two distinct sub-classes as parents and generates a new instance that inherits different traits from each parent and contributes to the diversity within the data distribution.
Using generative adversarial networks for improving classification effectiveness in credit card fraud detection
- Computer ScienceInf. Sci.
- 2019
IGAN-IDS: An imbalanced generative adversarial network towards intrusion detection system in ad-hoc networks
- Computer ScienceAd Hoc Networks
- 2020
How to Better Distinguish Security Bug Reports (Using Dual Hyperparameter Optimization)
- Computer ScienceEmpir. Softw. Eng.
- 2021
This research finds that the SWIFT’s dual optimization of both pre-processor and learner is more useful than optimizing each of them individually, and suggests that dual optimization is both practical and useful.
Phishing URL Detection with Oversampling based on Text Generative Adversarial Networks
- Computer Science2018 IEEE International Conference on Big Data (Big Data)
- 2018
This paper trains text generative adversarial networks (text-GANs) with URLs in the minority class and generates synthetic URLs that can be made part of the training set and some of the original test URLs are exactly regenerated by the proposedtext generative model.
Using Improved Conditional Generative Adversarial Networks to Detect Social Bots on Twitter
- Computer ScienceIEEE Access
- 2020
An improved conditional generative adversarial network (improved CGAN) is proposed to extend imbalanced data sets before applying training classifiers to improve the detection accuracy of social bots and improves the CGAN convergence judgment condition.