A practical approach for clustering large data flows of malicious URLs


Over the last couple of years there has been a substantial increase of malicious attacks that are using the Internet as an infection vector. One solution to counter this problem is to implement a filter at the network connection level. Due to the large amount of data that has to be filtered in real-time, any practical approach has to consider both memory usage and performance limitations in order to deliver a fast response time. This paper presents a cloud-based mechanism that can be used to filter large amounts of network traffic with respect to both memory and response time limitations. The algorithms have been tested on data flows of more than 750 million of URLs/day. We will address different practical problems, such as storage, computation time and large data flow clustering. In the end we will also present different statistical results that we obtained over a period of 2 months.

DOI: 10.1007/s11416-015-0239-x

Extracted Key Phrases

11 Figures and Tables

Cite this paper

@article{Popescu2015APA, title={A practical approach for clustering large data flows of malicious URLs}, author={Adrian-Stefan Popescu and Dragos Gavrilut and Daniel-Ionut Irimia}, journal={Journal of Computer Virology and Hacking Techniques}, year={2015}, volume={12}, pages={37-47} }