Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue

  title={Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue},
  author={Rui Shu and Tianpei Xia and Laurie Williams and Tim Menzies},
  journal={2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR)},
  • Rui ShuTianpei Xia T. Menzies
  • Published 22 March 2022
  • Computer Science
  • 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR)
Background: Machine learning techniques have been widely used and demonstrate promising performance in many software security tasks such as software vulnerability prediction. However, the class ratio within software vulnerability datasets is often highly imbalanced (since the percentage of observed vulnerability is usually very low). Goal: To help security practitioners address software security data class imbalanced issues and further help build better prediction models with resampled datasets… 

Figures and Tables from this paper

A Novelty Adversarial Loss for Classifying Unbalanced Anomaly Images

This paper increases an encoder loss to obtain a discriminative margin for abnormal samples based on existing adversarial loss to achieve an efficient classification score with a higher area under the curve (AUC) results compared to the existing common methods on the above two datasets.

On the effectiveness of data balancing techniques in the context of ML-based test case prioritization

An empirical study on applying 19 state-of- the- art data balancing techniques for dealing with imbalanced data sets in the TCP context, based on the most comprehensive publicly available datasets demonstrates thatData balancing techniques can improve the effectiveness of the best-known ML-based TCP technique for most subjects, with an average of 0.06.



Generative Adversarial Networks for Black-Box API Attacks with Limited Training Data

Stealth attacks with small footprint (using a small number of API calls) make adversarial machine learning practical under the realistic case with limited training data available to the adversary.

Challenging Machine Learning Algorithms in Predicting Vulnerable JavaScript Functions

This paper applies 8 machine learning algorithms to build prediction models using a new dataset constructed for this research from the vulnerability information in public databases of the Node Security Project and the Snyk platform, and code fixing patches from GitHub to find the best performing models.

Using Generative Adversarial Networks for Data Augmentation in Android Malware Detection

Experiments show that both traditional techniques and GAN can improve the accuracy of classification, but GAN could more effectively improve the classification model to detect that the dataset originally has a small number of datasets and the recognition accuracy is lower.

An Empirical Study on Unsupervised Network Anomaly Detection using Generative Adversarial Networks

An empirical study on the capability of GANs in network anomaly detection, which adopts two existing GAN models and develops new neural networks for their components, i.e., generator and discriminator.

MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction

MAHAKIL is introduced, a novel and efficient synthetic oversampling approach for software defect datasets that is based on the chromosomal theory of inheritance that interprets two distinct sub-classes as parents and generates a new instance that inherits different traits from each parent and contributes to the diversity within the data distribution.

How to Better Distinguish Security Bug Reports (Using Dual Hyperparameter Optimization)

This research finds that the SWIFT’s dual optimization of both pre-processor and learner is more useful than optimizing each of them individually, and suggests that dual optimization is both practical and useful.

Phishing URL Detection with Oversampling based on Text Generative Adversarial Networks

This paper trains text generative adversarial networks (text-GANs) with URLs in the minority class and generates synthetic URLs that can be made part of the training set and some of the original test URLs are exactly regenerated by the proposedtext generative model.

Using Improved Conditional Generative Adversarial Networks to Detect Social Bots on Twitter

An improved conditional generative adversarial network (improved CGAN) is proposed to extend imbalanced data sets before applying training classifiers to improve the detection accuracy of social bots and improves the CGAN convergence judgment condition.