Corpus ID: 8856335

Large-Scale Automatic Classification of Phishing Pages

@inproceedings{Whittaker2010LargeScaleAC,
  title={Large-Scale Automatic Classification of Phishing Pages},
  author={Colin Whittaker and Brian Ryner and Marria Nazif},
  booktitle={NDSS},
  year={2010}
}
Phishing websites, fraudulent sites that impersonate a trusted third party to gain access to private data, continue to cost Internet users over a billion dollars each year. [...] Key Method Our classifier analyzes millions of pages a day, examining the URL and the contents of a page to determine whether or not a page is phishing. Unlike previous work in this field, we train the classifier on a noisy dataset consisting of millions of samples from previously collected live classification data. Despite the noise…Expand
On the Character of Phishing URLs: Accurate and Robust Statistical Learning Classifiers
TLDR
Using a two-sample Kolmogorov-Smirnov test along with other features, accuracy of phishing URL classification can be greatly increased through the use of these statistical measures. Expand
A real-time automatic detection of phishing URLs
  • Jianyi Zhang, Yonghao Wang
  • Computer Science
  • Proceedings of 2012 2nd International Conference on Computer Science and Network Technology
  • 2012
TLDR
Some new aspects of the common features that appear in the phishing URLs are revealed, and a statistical machine learning classifier is introduced to detect thephishing sites, which relies on these selected features. Expand
Examination of data, rule generation and detection of phishing URLs using online logistic regression
  • M. Feroz, S. Mengel
  • Computer Science
  • 2014 IEEE International Conference on Big Data (Big Data)
  • 2014
TLDR
An approach that classifies URLs automatically based on their lexical and host-based features, and achieves 93-97% accuracy by detecting a large number of phishing hosts, while maintaining a modest false positive rate is described. Expand
Two-Pronged Phish Snagging
TLDR
The implementation framework, called PhishSnag, operates between a user's mail transfer agent and mail user agent and processes each arriving email for phishing attacks even before reaching the inbox, and significantly outperform the previous unsupervised and supervised phishing detection schemes for emails in the literature. Expand
Learning to Detect Phishing Webpages
TLDR
This work proposes many novel content based features and applies cutting-edge machine learning techniques to demonstrate that this approach can detect phishing webpages with error rates 0.04-0.44%, false positive and false negative rates of 0.0-0% on real-world data sets using Random Forests classifier, thereby improving previous results on the important problem of phishing detection. Expand
LEARNING TO DETECT PHISHING URLs
Phishing attacks have been on the rise and performing certain actions such as mouse hovering, clicking, etc. on malicious URLs may cause unsuspecting Internet users to fall victims of identity theftExpand
Large-Scale Lexical Classification of Phishing Websites
TLDR
This study investigates the use of machine learning for phishing detection, with features extracted from the URL only, and builds a large-scale lexical classifier, Poseidon, that is able to accelerate the classification of phishing sites, reducing the load on a more expensive classification process by 99%. Expand
Phishing URL Detection Using URL Ranking
TLDR
This paper describes an approach that classifies URLs automatically based on their lexical and host-based features, and achieves 93-98% accuracy by detecting a large number of phishing hosts, while maintaining a modest false positive rate. Expand
A Novel Approach for Phishing URLs Detection
Seeking sensitive user data in the form of online banking user-id and passwords or credit card information, which may then be used by ‘phishers’ for their own personal gain is the primary objectiveExpand
Detecting Phishing Sites Using URLs Collected from Emails
Phishing is the malicious behavior of stealing personal information from computer users. It is a very popular account-theft-method among cyber criminals. Hence, developing a new approach to solveExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 36 REFERENCES
Learning to detect phishing emails
TLDR
This method is applicable, with slight modification, to detection of phishing websites, or the emails used to direct victims to these sites, and correctly identify over 96% of the phishing emails while only mis-classifying on the order of 0.1%" of the legitimate emails. Expand
A framework for detection and measurement of phishing attacks
Phishing is form of identity theft that combines social engineering techniques and sophisticated attack vectors to harvest financial information from unsuspecting consumers. Often a phisher tries toExpand
Cantina: a content-based approach to detecting phishing web sites
TLDR
The design, implementation, and evaluation of CANTINA, a novel, content-based approach to detecting phishing web sites, based on the TF-IDF information retrieval algorithm, are presented. Expand
Why phishing works
TLDR
This paper provides the first empirical evidence about which malicious strategies are successful at deceiving general users by analyzing a large set of captured phishing attacks and developing a set of hypotheses about why these strategies might work. Expand
On the Effectiveness of Techniques to Detect Phishing Sites
TLDR
Over a period of three weeks, the effectiveness of the blacklists maintained by Google and Microsoft with 10,000 phishing URLs was tested, and the existence of page properties that can be used to identify phishing pages were explored. Expand
Beyond blacklists: learning to detect malicious web sites from suspicious URLs
TLDR
This paper describes an approach to this problem based on automated URL classification, using statistical methods to discover the tell-tale lexical and host-based properties of malicious Web site URLs. Expand
Identifying suspicious URLs: an application of large-scale online learning
TLDR
It is demonstrated that recently-developed online algorithms can be as accurate as batch techniques, achieving classification accuracies up to 99% over a balanced data set. Expand
An Empirical Analysis of Phishing Blacklists
TLDR
This paper used 191 fresh phish that were less than 30 minutes old to conduct two tests on eight anti-phishing toolbars and found that two tools using heuristics to complement blacklists caught signicantly more phish initially than those using only blacklists. Expand
Decision strategies and susceptibility to phishing
TLDR
Preliminary analysis of interviews with 20 non-expert computer users to reveal their strategies and understand their decisions when encountering possibly suspicious emails suggests that people can manage the risks that they are most familiar with, but don't appear to extrapolate to be wary of unfamiliar risks. Expand
Phinding Phish: Evaluating Anti-Phishing Tools
TLDR
An automated test bed for testing antiphishing tools is developed and it is demonstrated that the source of phishing URLs and the freshness of the URLs tested can significantly impact the results of anti-phishing tool testing. Expand
...
1
2
3
4
...