Detecting unknown malicious code by applying classification techniques on OpCode patterns

@article{Shabtai2011DetectingUM,
  title={Detecting unknown malicious code by applying classification techniques on OpCode patterns},
  author={Asaf Shabtai and R. Moskovitch and Clint Feher and S. Dolev and Y. Elovici},
  journal={Security Informatics},
  year={2011},
  volume={1},
  pages={1-22}
}
In previous studies classification algorithms were employed successfully for the detection of unknown malicious code. Most of these studies extracted features based on byte n-gram patterns in order to represent the inspected files. In this study we represent the inspected files using OpCode n-gram patterns which are extracted from the files after disassembly. The OpCode n-gram patterns are used as features for the classification process. The classification process main goal is to detect unknown… Expand
Study of Dataset Feature Filtering of OpCode for Malware Detection Using SVM Training Phase
Malware can be defined as any type of malicious code that has the potential to harm a computer or network. To detect unknown malware families, the frequency of the appearance of Opcode (OperationExpand
Malware detection method based on the control-flow construct feature of software
TLDR
Experimental results illustrate that the proposed feature-selection approach can achieve the 97.0% malware detection accuracy and 3.2% false positive rate with the Random Forest classifier. Expand
Malware detection: program run length against detection rate
TLDR
The findings show that malware can be detected with different program run lengths using a small number of opcodes and how long a program has to be monitored to ensure an accurate support vector machine (SVM) classification of benign and malicious software. Expand
P-Code Based Classification to Detect Malicious VBA Macro
TLDR
This paper discusses the extraction of p-code within macro based documents and presents the classification of benign and malicious p- code using five learning classifiers and obtained a high accuracy (98.8%) and is promising for macro malware detection in real-world applications. Expand
Static Signature-Based Malware Detection Using Opcode and Binary Information
TLDR
A static signature-based malware detection method based on opcode and binary file signatures based on N-gram distribution is described and improved using a proposed Top K approach which suggests selecting top most similar k files in classification of a new unknown file. Expand
Detecting Unknown Malware on Android by Machine Learning Using the Feature of Dalvik Operation Code
TLDR
This work proposes the use of Dalvik Operation Code on Android, generated by disassembling the application, and uses n-gram of the operation code as features for the classification process, based on text categorization concepts. Expand
Malicious Code Detection through Data Mining Techniques
TLDR
Three algorithms named as RIPPER, Naives Bayes approach, and Multi-Naive Bayes are proposed using data mining techniques and the comparison of these algorithms are proposed. Expand
Heterogeneous Opcode Space for Metamorphic Malware Detection
TLDR
The proposed statistical non-signature-based detector creates two different meta feature spaces each comprising 25 attributes for their detection of metamorphic malware samples and is recommended to be used to assist commercial AV scanners. Expand
Detecting Malware Based on Opcode N-Gram and Machine Learning
TLDR
This paper uses various n-gram size from 1 to 15 to compare different feature select methods, and performs experiments with different MFP, short for malicious files percentage, to demonstrate which setting is better. Expand
N-Gram Analysis in SVM Training Phase Reduction Using Dataset Feature Filtering for Malware Detection
An n-gram is a sub-sequence of n items from a given sequence. Various areas of statistical natural language processing and genetic sequence analysis are using N-gram Analysis. In which sequenceExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 62 REFERENCES
Unknown Malcode Detection Using OPCODE Representation
TLDR
This work presents a full methodology for the detection of unknown malicious code, based on text categorization concepts, and indicates that greater than 99% accuracy can be achieved through the use of a training set that has a malicious file percentage lower than 15%, which is higher than in the previous experience with byte sequence n-gram representation. Expand
Unknown malcode detection and the imbalance problem
TLDR
This work presents a methodology for the detection of unknown malicious code, which examines concepts from text categorization, based on n-grams extraction from the binary code and feature selection, and indicates that greater than 95% accuracy can be achieved through the use of a training set that has a malicious file content of less than 33.3%. Expand
Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey
TLDR
A framework for detecting new malicious code in executable files can be designed to achieve very high accuracy while maintaining low false positives (i.e. misclassifying benign files as malicious) and should include training of multiple classifiers on various types of features, as well as an active learning mechanism to maintain high detection accuracy. Expand
Learning and Classification of Malware Behavior
TLDR
The effectiveness of the proposed method for learning and discrimination of malware behavior is demonstrated, especially in detecting novel instances of malware families previously not recognized by commercial anti-virus software. Expand
Idea: Opcode-Sequence-Based Malware Detection
TLDR
It is shown that this method provides an effective way to detect variants of known malware families, based on the frequency of appearance of opcode sequences, which is described a method to mine the relevance of each opcode and weigh each opcodes sequence frequency. Expand
Data mining methods for malware detection using instruction sequences
TLDR
A novel idea of automatically identifying critical instruction sequences that can classify between malicious and clean programs using data mining techniques is presented, formulated as a binary classification problem and built logistic regression, neural networks and decision tree models. Expand
McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables
TLDR
A fast statistical malware detection tool that is intended to improve the scalability of existing malware collection and analysis approaches, McBoost reduces the overall time of analysis by classifying and filtering out the least suspicious binaries and passing only the most suspicious ones to a detailed binary analysis process for signature extraction. Expand
PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware
TLDR
The results from the experiments show the approach can be used to significantly reduce the time required to analyze such malware, and to improve the performance of malware detection tools. Expand
Limits of Static Analysis for Malware Detection
TLDR
A binary obfuscation scheme that relies on opaque constants, which are primitives that allow us to load a constant into a register such that an analysis tool cannot determine its value, demonstrates that static analysis techniques alone might no longer be sufficient to identify malware. Expand
Detection of unknown computer worms based on behavioral classification of the host
TLDR
This paper focuses on the feasibility of accurately detecting unknown worm activity in individual computers while minimizing the required set of features collected from the monitored computer. Expand
...
1
2
3
4
5
...