• Corpus ID: 245650160

Rank-1 Similarity Matrix Decomposition For Modeling Changes in Antivirus Consensus Through Time

@article{Joyce2022Rank1SM,
  title={Rank-1 Similarity Matrix Decomposition For Modeling Changes in Antivirus Consensus Through Time},
  author={Robert J. Joyce and Edward Raff and Charles Nicholas},
  journal={ArXiv},
  year={2022},
  volume={abs/2201.00757}
}
Although groups of strongly correlated antivirus engines are known to exist, at present there is limited understanding of how or why these correlations came to be. Using a corpus of 25 million VirusTotal reports representing over a decade of antivirus scan data, we challenge prevailing wisdom that these correlations primarily originate from "first-order" interactions such as antivirus vendors copying the labels of leading vendors. We introduce the Temporal Rank-1 Similarity Matrix decomposition… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 29 REFERENCES

On the Lack of Consensus in Anti-Virus Decisions: Metrics and Insights on Building Ground Truths of Android Malware

TLDR
This paper extensively investigates the lack of agreement among AV engines, and proposes a set of metrics that quantitatively describe the different dimensions of this lack of consensus, and focuses not only on AV binary decision but also on the notoriously hard problem of labels that AVs associate with suspicious files.

Measuring and Modeling the Label Dynamics of Online Anti-Malware Engines

TLDR
A data-driven approach to categorize, reason, and validate common labeling methods used by researchers, and empirically show certain engines fail to perform in-depth analysis on submitted files and can easily produce false positives.

Does Malware Detection Improve with Diverse AntiVirus Products? An Empirical Study

TLDR
This study provides additional evidence that detection capabilities are improved by diversity by sending malware samples to AVs available from the VirusTotal service to evaluate the benefits in detection from using more than one AV.

EM Meets Malicious Data: A Novel Method for Massive Malware Family Inference

TLDR
This paper examines the problem of inferring underground family truth from inconsistent antivirus vendor labels as a maximum likelihood estimation problem with hidden random variables and proposes a solution based on the expectation-maximization algorithm.

AVclass: A Tool for Massive Malware Labeling

TLDR
AVclass is described, an automatic labeling tool that given the AV labels for a, potentially massive, number of samples outputs the most likely family names for each sample, and implements novel automatic techniques to address 3 key challenges: normalization, removal of generic tokens, and alias detection.

A Survey of Machine Learning Methods and Challenges for Windows Malware Classification

TLDR
This survey aims to be useful both to cybersecurity practitioners who wish to learn more about how machine learning can be applied to the malware problem, and to give data scientists the necessary background into the challenges in this uniquely complicated space.

AV-Meter: An Evaluation of Antivirus Scans and Labels

TLDR
The literature lacks any systematic study on validating the performance of antivirus scanners, and the reliability of those labels or detection, and researchers rely on AV labels to establish a baseline of ground truth to compare their detection and classification algorithms.

Limits of Static Analysis for Malware Detection

TLDR
A binary obfuscation scheme that relies on opaque constants, which are primitives that allow us to load a constant into a register such that an analysis tool cannot determine its value, demonstrates that static analysis techniques alone might no longer be sufficient to identify malware.

Tensor Decompositions and Applications

This survey provides an overview of higher-order tensor decompositions, their applications, and available software. A tensor is a multidimensional or $N$-way array. Decompositions of higher-order