Unknown Malcode Detection Using OPCODE Representation

@inproceedings{Moskovitch2008UnknownMD,
  title={Unknown Malcode Detection Using OPCODE Representation},
  author={R. Moskovitch and Clint Feher and Nir Tzachar and Eugene Berger and Marina Gitelman and S. Dolev and Y. Elovici},
  booktitle={EuroISI},
  year={2008}
}
The recent growth in network usage has motivated the creation of new malicious code for various purposes, including economic ones. [...] Key Method We then use n-grams of the OpCodes as features for the classification process. We present a full methodology for the detection of unknown malicious code, based on text categorization concepts. We performed an extensive evaluation of a test collection of more than 30,000 files, in which we evaluated extensively the OpCode n-gram representation and investigated the…Expand
Detecting unknown malicious code by applying classification techniques on OpCode patterns
TLDR
The imbalance problem is investigated, referring to several real-life scenarios in which malicious files are expected to be about 10% of the total inspected files, and a chronological evaluation showed a clear trend in which the performance improves as the training set is more updated. Expand
Opcode sequences as representation of executables for data-mining-based unknown malware detection
TLDR
This paper proposes a new method to detect unknown malware families based on the frequency of the appearance of opcode sequences, and describes a technique to mine the relevance of each opcode and assess the Frequency of Each opcode sequence. Expand
Detecting Unknown Malware on Android by Machine Learning Using the Feature of Dalvik Operation Code
TLDR
This work proposes the use of Dalvik Operation Code on Android, generated by disassembling the application, and uses n-gram of the operation code as features for the classification process, based on text categorization concepts. Expand
Malicious Code Detection Using Active Learning
TLDR
This work presents a complete methodology for the detection of unknown malicious code, inspired by text categorization concepts, and defines specific evaluation measures based on the known precision and recall measures, which show the accuracy of the acquisition process and the improvement in the classifier resulting from the efficient acquisition process. Expand
Detection of zero-day malware based on the analysis of opcode sequences
TLDR
An anomaly detection approach which can cope with the problem of new malware detection and allows one to detect malware unseen previously, and results in a higher accuracy rate than that of the existing analogues. Expand
Using opcode sequences in single-class learning to detect unknown malware
TLDR
The authors propose a new method that uses single-class learning to detect unknown malware families based on examining the frequencies of the appearance of opcode sequences to build a machine-learning classifier using only one set of labelled instances within a specific class of either malware or legitimate software. Expand
Heterogeneous Opcode Space for Metamorphic Malware Detection
TLDR
The proposed statistical non-signature-based detector creates two different meta feature spaces each comprising 25 attributes for their detection of metamorphic malware samples and is recommended to be used to assist commercial AV scanners. Expand
Detecting Malware Based on Opcode N-Gram and Machine Learning
TLDR
This paper uses various n-gram size from 1 to 15 to compare different feature select methods, and performs experiments with different MFP, short for malicious files percentage, to demonstrate which setting is better. Expand
An unknown malware detection scheme based on the features of graph
TLDR
The function call graph of an executable, which includes the functions and the call relations between them, is selected as the representation of the executable in this method, and it can achieve as high as 96.8% accuracy. Expand
Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey
TLDR
A framework for detecting new malicious code in executable files can be designed to achieve very high accuracy while maintaining low false positives (i.e. misclassifying benign files as malicious) and should include training of multiple classifiers on various types of features, as well as an active learning mechanism to maintain high detection accuracy. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 21 REFERENCES
Unknown malcode detection via text categorization and the imbalance problem
TLDR
An extensive evaluation using a test collection that contains more than 30,000 malicious and benign files is performed, in which the imbalance problem is investigated and results indicate that greater than 95% accuracy can be achieved through the use of a training set that contains below 20% malicious file content. Expand
Learning to Detect and Classify Malicious Executables in the Wild
TLDR
The use of machine learning and data mining to detect and classify malicious executables as they appear in the wild is described and it is suggested that the methodology could be used as the basis for an operational system for detecting previously undiscovered malicious executable. Expand
N-gram-based detection of new malicious code
TLDR
This work explores the idea of automatically detecting new malicious code using the collected dataset of the benign and malicious code, and obtained accuracy of 100% in the training data, and 98% in 3-fold cross-validation. Expand
Data mining methods for detection of new malicious executables
TLDR
This work presents a data mining framework that detects new, previously unseen malicious executables accurately and automatically and more than doubles the current detection rates for new malicious executable. Expand
Learning to detect malicious executables in the wild
TLDR
A fielded application for detecting malicious executables in the wild is described using techniques from machine learning and data mining, and boosted decision trees outperformed other methods with an area under the roc curve of 0.996. Expand
Malware prevalence in the KaZaA file-sharing network
TLDR
Using a light-weight crawler built for the KaZaA file-sharing network, this work finds that over 15% of the crawled files were infected by 52 different viruses, many of which open a backdoor through which an attacker can remotely control the compromised machine, send spam, or steal a user's confidential information. Expand
A Feature Selection and Evaluation Scheme for Computer Virus Detection
TLDR
This paper presents a data mining approach that conducts an exhaustive feature search on a set of computer viruses and strives to obviate over-fitting, and evaluates the predictive power of a classifier by taking into account dependence relationships that exist between viruses. Expand
C4.5: Programs for Machine Learning
TLDR
A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Expand
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
TLDR
A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge. Expand
Machine learning
TLDR
Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Expand
...
1
2
3
...