• Corpus ID: 236957354

Leveraging Uncertainty for Improved Static Malware Detection Under Extreme False Positive Constraints

@article{Nguyen2021LeveragingUF,
  title={Leveraging Uncertainty for Improved Static Malware Detection Under Extreme False Positive Constraints},
  author={A. Nguyen and Edward Raff and Charles K. Nicholas and James Holt},
  journal={ArXiv},
  year={2021},
  volume={abs/2108.04081}
}
The detection of malware is a critical task for the protection of computing environments. This task often requires extremely low false positive rates (FPR) of 0.01% or even lower, for which modern machine learning has no readily available tools. We introduce the first broad investigation of the use of uncertainty for malware detection across multiple datasets, models, and feature types. We show how ensembling and Bayesian treatments of machine learning methods for static malware detection allow… 

Figures and Tables from this paper

Can We Leverage Predictive Uncertainty to Detect Dataset Shift and Adversarial Examples in Android Malware Detection?

TLDR
Predictive uncertainty indeed helps achieve reliable malware detection in the presence of dataset shift, but cannot cope with adversarial evasion attacks; approximate Bayesian methods are promising to calibrate and generalize malware detectors to deal with dataset shift; but can not cope with adversary evasion attacks.

A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels

TLDR
It is proved that bounds on specific metrics used to evaluate clustering algorithms and multi-class classifiers can be computed without reference labels, and a procedure is introduced that uses an AGTR to identify inaccurate evaluation results produced from datasets of dubious quality.

Firenze: Model Evaluation Using Weak Signals

TLDR
This paper intro-duce Firenze, a novel framework for comparative evaluation of ML models’ performance using domain expertise, encoded into scalable functions called markers, and shows that markers computed and combined over select subsets of samples called regions of interest can provide a strong estimate of their real-world performances.

Improving Out-of-Distribution Detection via Epistemic Uncertainty Adversarial Training

TLDR
A simple adversarial training scheme that incorporates an attack of the epistemic uncertainty predicted by the dropout ensemble is devised, which improves OOD detection performance on standard data, and improves AUC FPR < 1% from near-random guessing performance to ≥ 0 .

References

SHOWING 1-10 OF 57 REFERENCES

LUNA: Quantifying and Leveraging Uncertainty in Android Malware Analysis through Bayesian Machine Learning

  • M. BackesM. Nauman
  • Computer Science
    2017 IEEE European Symposium on Security and Privacy (EuroS&P)
  • 2017
TLDR
Bayesian machine learning is utilized – an alternative paradigm based on Bayesian statistical inference – which preserves the concept of uncertainty in all steps of calculation to reduce incorrect decisions, and significantly improves the accuracy of classification of Android apps.

Optimized Zero False Positives Perceptron Training for Malware Detection

TLDR
This paper proposes a modified version of the perceptron algorithm able to detect malware samples while training at a low rate of false positives, and provides a method of optimizing the training speed for the algorithm while maintaining the same accuracy.

Deep neural network based malware detection using two dimensional binary program features

TLDR
A deep neural network based malware detection system that Invincea has developed is introduced, which achieves a usable detection rate at an extremely low false positive rate and scales to real world training example volumes on commodity hardware.

Ensemble Models for Data-driven Prediction of Malware Infections

TLDR
ESM can effectively predict malware infection ratios over time upto 4 times better compared to several baselines on various metrics, and its performance is stable and robust even when the number of detected infections is low.

AVclass: A Tool for Massive Malware Labeling

TLDR
AVclass is described, an automatic labeling tool that given the AV labels for a, potentially massive, number of samples outputs the most likely family names for each sample, and implements novel automatic techniques to address 3 key challenges: normalization, removal of generic tokens, and alias detection.

McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables

TLDR
A fast statistical malware detection tool that is intended to improve the scalability of existing malware collection and analysis approaches, McBoost reduces the overall time of analysis by classifying and filtering out the least suspicious binaries and passing only the most suspicious ones to a detailed binary analysis process for signature extraction.

Malicious PDF detection using metadata and structural features

TLDR
This paper presents a framework for robust detection of malicious documents through machine learning based on features extracted from document metadata and structure, and shows that the Random Forests classification method, an ensemble classifier that randomly selects features for each individual classification tree, yields the best detection rates, even on previously unseen malware.

EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models

TLDR
The authors hope that the dataset, code and baseline model provided by EMBER will help invigorate machine learning research for malware detection, in much the same way that benchmark datasets have advanced computer vision research.

Ensemble Learning for Low-Level Hardware-Supported Malware Detection

TLDR
This paper explores the use of both specialized detectors and ensemble learning techniques to improve performance of the hardware detector, and reduces the false positive rate by more than half compared to a single detector, while increasing the detection rate.

IFI-TB-2013-02 D REBIN : Efficient and Explainable Detection of Android Malware in Your Pocket

TLDR
DREBIN is proposed, a lightweight method for detection of Android malware that enables identifying malicious applications directly on the smartphone and outperforms several related approaches and detects 94% of the malware with few false alarms.
...