• Corpus ID: 233324305

Identifying botnet IP address clusters using natural language processing techniques on honeypot command logs

  title={Identifying botnet IP address clusters using natural language processing techniques on honeypot command logs},
  author={Valentino Crespi and Wes Hardaker and Sami Abu-El-Haija and A. G. Galstyan},
Computer security has been plagued by increasing formidable, dynamic, hard-to-detect, hard-to-predict, and hard-to-characterize hacking techniques. Such techniques are very often deployed in self-propagating worms capable of automatically infecting vulnerable computer systems and then building large bot networks, which are then used to launch coordinated attacks on designated targets. In this work, we investigate novel applications of Natural Language Processing (NLP) methods to detect and… 

Figures from this paper



Learning Invariant Representations of Social Media Users

A novel procedure to learn a mapping from short episodes of user activity on social media to a vector space in which the distance between points captures the similarity of the corresponding users’ invariant features is proposed.

The Secure Shell (SSH) Protocol Assigned Numbers

This document defines the instructions to the IANA and the initial state of the IANA assigned numbers for the Secure Shell (SSH) protocol. It is intended only for the initialization of the IANA

Deep Temporal Clustering : Fully Unsupervised Learning of Time-Domain Features

A novel algorithm is proposed, Deep Temporal Clustering (DTC), to naturally integrate dimensionality reduction and temporal clustering into a single end-to-end learning framework, fully unsupervised, using time series data from diverse domains.

Deep Clustering with Convolutional Autoencoders

A convolutional autoencoders structure is developed to learn embedded features in an end-to-end way and a clustering oriented loss is directly built on embedded features to jointly perform feature refinement and cluster assignment.

Temporal Patterns in Bot Activities

This paper discovers motifs, discords, joins, bursts and dynamic clusters in activities of Twitter bots, and explains the significance of these temporal patterns in gaining competitive advantage over humans.

Unsupervised Deep Embedding for Clustering Analysis

Deep Embedded Clustering is proposed, a method that simultaneously learns feature representations and cluster assignments using deep neural networks and learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective.

Neural Word Embedding as Implicit Matrix Factorization

It is shown that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks, and conjecture that this stems from the weighted nature of SGNS's factorization.

Your botnet is my botnet: analysis of a botnet takeover

This paper reports on efforts to take control of the Torpig botnet and study its operations for a period of ten days, which provides a new understanding of the type and amount of personal information that is stolen by botnets.

A Taxonomy of Botnet Structures

We propose a taxonomy of botnet structures, based on their utility to the botmaster. We propose key metrics to measure their utility for various activities (e.g., spam, ddos). Using these performance

Latent Dirichlet Allocation