Suspicious URL Filtering Based on Logistic Regression with Multi-view Analysis


The current malicious URLs detecting techniques based on whole URL information are hard to detect the obfuscated malicious URLs. The most precise way to identify a malicious URL is verifying the corresponding web page contents. However, it costs very much in time, traffic and computing resource. Therefore, a filtering process that detecting more suspicious URLs which should be further verified is required in practice. In this work, we propose a suspicious URL filtering approach based on multi-view analysis in order to reduce the impact from URL obfuscation techniques. URLs are composed of several portions, each portion has a specific use. The proposed method intends to learn the characteristics from multiple portions (multi-view) of URLs for giving the suspicion level of each portion. Adjusting the suspicion threshold of each portion, the proposed system would select the most suspicious URLs. This work uses the real dataset from T. Co. to evaluate the proposed system. The requests from T. Co. are (1) detection rate should be less than 25%, (2) missing rate should be lower than 25%, and (3) the process with one hour data should be end in an hour. The experiment results show that our approach is effective, is capable to reserve more malicious URLs in the selected suspicious ones and satisfy the requests given by practical environment, such as T. Co. daily works.

DOI: 10.1109/ASIAJCIS.2013.19

5 Figures and Tables

Citations per Year

Citation Velocity: 5

Averaging 5 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Cite this paper

@article{Su2013SuspiciousUF, title={Suspicious URL Filtering Based on Logistic Regression with Multi-view Analysis}, author={Ke-Wei Su and Kuo-Ping Wu and Hahn-Ming Lee and Te-En Wei}, journal={2013 Eighth Asia Joint Conference on Information Security}, year={2013}, pages={77-84} }