Malware detection using statistical analysis of byte-level file content

@inproceedings{Tabish2009MalwareDU,
  title={Malware detection using statistical analysis of byte-level file content},
  author={Syeda Momina Tabish and Muhammad Zubair Shafiq and Muddassar Farooq},
  booktitle={CSI-KDD '09},
  year={2009}
}
Commercial anti-virus software are unable to provide protection against newly launched (a.k.a "zero-day") malware. In this paper, we propose a novel malware detection technique which is based on the analysis of byte-level file content. The novelty of our approach, compared with existing content based mining schemes, is that it does not memorize specific byte-sequences or strings appearing in the actual file content. Our technique is non-signature based and therefore has the potential to detect… 

Figures and Tables from this paper

Metamorphic Malware Detection Using Statistical Analysis
TLDR
Limitations of signature based detection for detecting metamorphic viruses are presented and a similarity measure method has been successfully applied in the field of document classification problem to apply similarity measures methods on static feature, API calls of executable to classify it as malware or benign.
Improving Malware Detection Response Time with Behavior-Based Statistical Analysis Techniques
TLDR
This paper presents a statistical based method that can be used to identify a specific dynamic behavior of a program and to extract sequences of native system functions with a potential malign outcome and proves to be an effective filtering method.
Detecting malicious files using non-signature-based methods
TLDR
Non-signature techniques for malware detection are investigated and methods of feature selection that are best suited for detection purposes are demonstrated that are effective in identifying and classifying morphed malware.
Classification of malware based on file content and characteristics
TLDR
Random Forest algorithm is efficient to be used in malware classification based on file content and characteristics, this was done through use of Clamp Integrated dataset that includes 5210 instances.
Similarity Measure for Obfuscated Malware Analysis
TLDR
The authors propose a statistical malware scanner that is effective in discriminating metamorphic malware samples from a large collection of benign executables and a non-signature-based scanner trained with small feature length to classify unseen malware and benign executable.
Case Studies on Intelligent Approaches for Static Malware Analysis
TLDR
Intelligent techniques for malware analysis with all preprocessing steps required to analyze any PE sample are outlined, which can help to detect zero day threats.
A Malware Variant Detection Method Based on Byte Randomness Test
TLDR
Experimental results show that the proposed method provides a fast and effective way to detect variants of known malware families.
DLLMiner: structural mining for malware detection
TLDR
This paper proposes an effective and efficient heuristic technique based on static analysis that not only detect malware with a very high accuracy, but also is robust against common evasion techniques such as junk injection and packing.
A Survey on the Detection of Windows Desktops Malware
TLDR
The survey conducted by us on the work done by the researchers in this field of malware detection is presented, and various techniques proposed/used for the detection of new or previously unseen Windows Desktops malware are presented.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 33 REFERENCES
Embedded Malware Detection Using Markov n-Grams
TLDR
It is shown that the entropy rate of Markov n-grams gets significantly perturbed at malcode embedding locations, and therefore can act as a robust feature for embedded malware detection.
Detection of New Malicious Code Using N-grams Signatures
TLDR
This work employs n-grams analysis to automatically generate signatures from malicious and benign software collections, capable of classifying unseen benign and malicious code.
Towards Stealthy Malware Detection
TLDR
This work proposes the use of statistical binary content analysis of files in order to detect suspicious anomalous file segments that may suggest insertion of malcode, and performs tests to determine whether known malcode can be easily distinguished from otherwise “normal” Windows executables, and whether self-encrypted files may be easy to spot.
Mining specifications of malicious behavior
TLDR
The technique derives such a specification by comparing the execution behavior of a known malware against the execution behaviors of a set of benign programs so that the output of the algorithm can be used by malware detectors to detect malware variants.
CloudAV: N-Version Antivirus in the Network Cloud
TLDR
It is shown that the average length of time to detect new threats by an antivirus engine is 48 days and that retrospective detection can greatly minimize the impact of this delay, and a new model for malware detection on end hosts based on providing antivirus as an in-cloud network service is advocated.
Automatic Classification of Executable Code for Computer Virus Detection
TLDR
It is shown that it is possible to construct automatic classification system, that would be able to distinguish “good” computer code from malicious code — in this case code of computer viruses — and which therefore could act as an intelligent virus scanner.
A Study of Malcode-Bearing Documents
TLDR
This paper investigates the possibility of detecting embedded malcode in Word documents using two techniques: static content analysis using statistical models of typical document content, and run-time dynamic tests on diverse platforms that can not only detect known malware, but also most zero-day attacks.
Data mining methods for detection of new malicious executables
TLDR
This work presents a data mining framework that detects new, previously unseen malicious executables accurately and automatically and more than doubles the current detection rates for new malicious executable.
Virus detection using data mining techinques
TLDR
An automatic heuristic method to detect unknown computer virus based on data mining techniques, namely decision tree and naive Bayesian network algorithms, is proposed and experiments are carried to evaluate the effectiveness the proposed approach.
Classification of packed executables for accurate computer virus detection
...
1
2
3
4
...