Identifying meaningful clusters in malware data

  title={Identifying meaningful clusters in malware data},
  author={Renato Cordeiro de Amorim and Carlos David Lopez Ruiz},
  journal={Expert Syst. Appl.},

Figures and Tables from this paper

An extensive empirical comparison of k-means initialisation algorithms

This paper focuses on the sensitivity of k-means to its initial set of centroids, and compares 17 such algorithms on 6,000 synthetic and 28 real-world data sets to show which algorithm excels in each scenario.



Extremely scalable storage and clustering of malware metadata

  • Matthew Asquith
  • Computer Science
    Journal of Computer Virology and Hacking Techniques
  • 2015
This paper proposes the use of a data structure called an aggregation overlay graph to reduce the total volume of metadata by more than an entire magnitude without any loss of information, and creates groups of similar samples that are capable of handling extremely high volumes.

Performance Evaluation of Features and Clustering Algorithms for Malware

A real-world, ground-truth dataset and multiple metrics are used to evaluate the performance of several algorithms, distance functions and sets of features used to study the malware clustering problem systematically.

Data clustering: 50 years beyond K-means

A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.

Malware analysis performance enhancement using cloud computing

A new approach to enhance malware analyzer performance is introduced that utilizes cloud computing features and integrates it with malware analyzezer and can reduce the time taken to detect new malware in the wild.

Maximizing accuracy in multi-scanner malware detection systems

Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering

Learning from Big Malwares

This paper analyzed two fundamental characteristics of Windows executable malwares from VirusTotal, the largest real malware repository, to show that malWares appear in bursts and that distributions ofmalwares are highly skewed.

Malware Analysis and Classification: A Survey

This survey paper provides an overview of techniques for analyzing and classifying the malwares and finds that behavioral patterns obtained either statically or dynamically can be exploited to detect and classify unknownmalwares into their known families using machine learning techniques.

Some methods for classification and analysis of multivariate observations

The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give