CharBot: A Simple and Effective Method for Evading DGA Classifiers

@article{Peck2019CharBotAS,
  title={CharBot: A Simple and Effective Method for Evading DGA Classifiers},
  author={Jonathan Peck and Claire Nie and Raaghavi Sivaguru and Charles Grumer and Femi G. Olumofin and Bin Yu and Anderson Clayton Alves Nascimento and Martine De Cock},
  journal={IEEE Access},
  year={2019},
  volume={7},
  pages={91759-91771}
}
Domain generation algorithms (DGAs) are commonly leveraged by malware to create lists of domain names, which can be used for command and control (C&C) purposes. [...] Key Method The CharBot is very simple, effective, and requires no knowledge of the targeted DGA classifiers. We show that retraining the classifiers on CharBot samples is not a viable defense strategy. We believe these findings show that DGA classifiers are inherently vulnerable to adversarial attacks if they rely only on the domain name string to…Expand
Inline Detection of DGA Domains Using Side Information
TLDR
This work trains and evaluates state-of-the-art deep learning and random forest (RF) classifiers for DGA detection using side information that is harder for adversaries to manipulate than the domain name itself, and finds that the DGA classifiers that rely on both thedomain name and side information have high performance and are more robust against adversaries. Expand
Hardening DGA Classifiers Utilizing IVAP
TLDR
Inductive Venn–Abers predictors (IVAPs) are proposed to use to calibrate the output of existing ML models for DGA classification, a computationally efficient procedure which consistently improves the predictive accuracy of classifiers at the expense of not offering predictions for a small subset of inputs and consuming an additional amount of training data. Expand
CLETer: A Character-level Evasion Technique Against Deep Learning DGA Classifiers
TLDR
CLETer, an improved DGA that provides a character-level evasion technique against state-of-the-art DGA classifiers, is proposed and it is proved that adversarial retraining is a viable defense strategy to CLETer. Expand
On the use of DGAs in malware: an everlasting competition of detection and evasion
TLDR
This paper compares two different approaches on the same set of DGAs: classical machine learning using manually engineered features and a 'deep learning' recurrent neural network and shows that the deep learning approach performs consistently better on all of the tested DGAs, with an average classification accuracy. Expand
MaskDGA: An Evasion Attack Against DGA Classifiers and Adversarial Defenses
TLDR
This paper presents MaskDGA, an evasion technique that uses adversarial learning to modify AGD names in order to evade inline DGA classifiers, without the need for the attacker to possess any knowledge about the DGAclassifier’s architecture or parameters, and proposes an extension to MaskD GA that allows an attacker to omit a subset of the modified AGDNames based on the classification results of the attacker’'s trained model, to achieve a desired evasion rate. Expand
DomainGAN: Generating Adversarial Examples to Attack Domain Generation Algorithm Classifiers
TLDR
GAN based DGAs are superior in evading DGA classifiers in comparison to traditional DGAs, and of the variants, the Wasserstein GAN with Gradient Penalty (WGANGP) is the highest performing DGA for uses both offensively and defensively. Expand
Khaos: An Adversarial Neural Network DGA With High Anti-Detection Ability
TLDR
Khaos is proposed, a novel DGA with high anti-detection ability based on neural language models and the Wasserstein Generative Adversarial Network (WGAN) and found that training the existing detection approach on a dataset including the domain names generated by Khaos can improve its detection ability. Expand
Analyzing the real-world applicability of DGA classifiers
TLDR
This paper proposes one novel classifier based on residual neural networks for each of the two tasks and extensively evaluate them as well as previously proposed classifiers in a unified setting and compares them with respect to explainability, robustness, and training and classification speed. Expand
The More, the Better: A Study on Collaborative Machine Learning for DGA Detection
TLDR
A comprehensive collaborative learning study that evaluates a total of eleven different variations of collaborative learning using three different state-of-the-art classifiers and shows that collaborative ML can lead to a reduction in FPR by up to 51.7%. Expand
Towards Adversarial Resilience in Proactive Detection of Botnet Domain Names by using MTD
TLDR
This work focuses on adversarial learning in DNS based IDSs from the perspective of a network operator and presents the concept to make existing and future machine learning basedIDSs more resilient against adversarialLearning attacks by applying multi-level Moving Target Defense strategies. Expand
...
1
2
...

References

SHOWING 1-10 OF 43 REFERENCES
An Evaluation of DGA Classifiers
TLDR
This paper compares and evaluates machine learning methods that classify domain names as benign or DGA, and labels the latter according to their malware family, and finds that all state-of-the-art classifiers are significantly better at catching domain names from malware families with a time-dependent seed compared to time-invariant DGAs. Expand
Detection of algorithmically generated domain names used by botnets: a dual arms race
TLDR
This paper compares two different approaches on the same set of DGAs: classical machine learning using manually engineered features and a 'deep learning' recurrent neural network and shows that the deep learning approach performs consistently better on all of the tested DGAs. Expand
MaskDGA: A Black-box Evasion Technique Against DGA Classifiers and Adversarial Defenses
TLDR
MaskDGA is presented, a practical adversarial learning technique that adds perturbation to the character-level representation of algorithmically generated domain names in order to evade DGA classifiers, without the attacker having any knowledge about the D GA classifier's architecture and parameters. Expand
DeepDGA: Adversarially-Tuned Domain Generation and Detection
TLDR
The hypothesis of whether adversarially generated domains may be used to augment training sets in order to harden other machine learning models against yet-to-be-observed DGAs is tested. Expand
A LSTM based framework for handling multiclass imbalance in DGA botnet detection
TLDR
A novel L STM.MI algorithm to combine both binary and multiclass classification models, where the original LSTM is adapted to be cost-sensitive, and is able to preserve the high accuracy on non-DGA generated class, while helping recognize 5 additional bot families. Expand
Inline DGA Detection with Deep Networks
TLDR
This work proposes a novel way to label a large volume of data collected from real traffic as DGA/non-DGA and by using deep learning techniques, which can be trained with large amounts of real traffic, rather than small synthetic data sets, and therefore have better performance. Expand
Inline Detection of Domain Generation Algorithms with Context-Sensitive Word Embeddings
TLDR
This work proposes a novel approach that combines context-sensitive word embeddings with a simple fully-connected classifier to perform classification of domains based on word-level information and shows that this architecture reliably outperformed existing techniques on wordlist-based DGA families. Expand
Automatic Detection of Malware-Generated Domains with Recurrent Neural Models
TLDR
A machine learning approach based on recurrent neural networks is able to detect domain names generated by DGAs with high precision and can automatically detect 93 % of malware-generated domain names for a false positive rate of 1:100. Expand
FANCI : Feature-based Automated NXDomain Classification and Intelligence
TLDR
This work shows that the FANCI system yields a very high classification accuracy at a low false positive rate, generalizes very well, and is able to identify previously unknown DGAs. Expand
Character Level based Detection of DGA Domain Names
TLDR
Training and evaluating on a dataset with 2M domain names shows that there is surprisingly little difference between various convolutional neural network and recurrent neural network based architectures in terms of accuracy, prompting a preference for the simpler architectures, since they are faster to train and to score, and less prone to overfitting. Expand
...
1
2
3
4
5
...