CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites

@article{Xiang2011CANTINAAF,
  title={CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites},
  author={Guang Xiang and Jason I. Hong and Carolyn Penstein Ros{\'e} and Lorrie Faith Cranor},
  journal={ACM Trans. Inf. Syst. Secur.},
  year={2011},
  volume={14},
  pages={21:1-21:28}
}
Phishing is a plague in cyberspace. Typically, phish detection methods either use human-verified URL blacklists or exploit Web page features via machine learning techniques. However, the former is frail in terms of new phish, and the latter suffers from the scarcity of effective features and the high false positive rate (FP). To alleviate those problems, we propose a layered anti-phishing solution that aims at (1) exploiting the expressiveness of a rich set of features with machine learning to… 

Figures and Tables from this paper

PhishMon: A Machine Learning Framework for Detecting Phishing Webpages

Through extensive evaluation on a dataset consisting of 4,800 distinct phishing and 17,500 distinct benign webpages, it is shown that PhishMon can distinguish unseen phishing from legitimate webpages with a very high degree of accuracy.

An Adaptive Machine Learning Based Approach for Phishing Detection Using Hybrid Features

This work develops a reliable detection system which can adaptively match the changing environment and phishing websites and does not require any service from the third-party.

Efficient deep learning techniques for the detection of phishing websites

Novel phishing URL detection models using Deep Neural Network, Long Short-Term Memory, and Convolution Neural Network are proposed using only 10 features of earlier work, which achieves an accuracy of 99.52% for DNN, 99.57% for LSTM and 99.43% for CNN.

Machine LearningTechniquesfor Detection of Website Phishing: A Review for Promises and Challenges

It is suggested that Internet users should know about phishing to avoid cyber-attacks and identify deep learning-based techniques with better performance for detecting phishing websites compared to the conventional ML techniques.

Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning

A multidimensional feature phishing detection approach based on a fast detection method by using deep learning that can reduce the detection time for setting a threshold and the experimental results show that the detection efficiency can be improved.

Towards detection of phishing websites on client-side using machine learning based approach

A machine learning based novel anti-phishing approach that extracts the features from client side only that has relatively high accuracy in detection of phishing websites as it achieved 99.39% true positive rate and 99.09% of overall detection accuracy.

Boosting the phishing detection performance by semantic analysis

This work extracts a series of semantic features through word2vec to better describe the features of phishing sites, and further fuse them with other multi-scale statistical features to construct a more robust phishing detection model.

Building Robust Phishing Detection System: an Empirical Analysis

This work proposes a simple approach to build a robust phishing page detection system, based on voting, that performs close to the native model when there is no adversarial attack, and is more robust against evasion attacks than thenative model.

Poster: PhishLex: A Proactive Zero-Day Phishing Defence Mechanism using URL Lexical Features

New lexical features are designed, a new dataset using the latest Phishing URLs are created, and a predictive model (PhishLex) is trained that outperforms the state- of-the-art techniques by achieving higher accuracy and lower false negative rate.
...

References

SHOWING 1-10 OF 36 REFERENCES

A Hierarchical Adaptive Probabilistic Approach for Zero Hour Phish Detection

The key insight behind the detection algorithm is to leverage existing human-verified blacklists and apply the shingling technique, a popular near-duplicate detection algorithm used by search engines, to detect phish in a probabilistic fashion with very high accuracy.

A hybrid phish detection approach by identity discovery and keywords retrieval

A novel hybrid phish detection method based on information extraction (IE) and information retrieval (IR) techniques that requires no training data, no prior knowledge of phishing signatures and specific implementations, and is able to adapt quickly to constantly appearing new phishing patterns.

PhishDef: URL names say it all

This paper proposes PhishDef, a phishing detection system that uses only URL names and combines the above three elements, a highly accurate method, lightweight (thus appropriate for online and client-side deployment), proactive (based on online classification rather than blacklists), and resilient to training data inaccuracies.

A layout-similarity-based approach for detecting phishing pages

An extension of the AntiPhish system (called DOMAntiPhish) is presented, which leverages layout similarity information to distinguish between malicious and benign web pages and significantly reduces the false alarm rate.

Learning to detect phishing emails

This method is applicable, with slight modification, to detection of phishing websites, or the emails used to direct victims to these sites, and correctly identify over 96% of the phishing emails while only mis-classifying on the order of 0.1%" of the legitimate emails.

Anomaly Based Web Phishing Page Detection

  • Yingyu PanXuhua Ding
  • Computer Science
    2006 22nd Annual Computer Security Applications Conference (ACSAC'06)
  • 2006
The idea is to examine the anomalies in Web pages, in particular, the discrepancy between a Web site's identity and its structural features and HTTP transactions, which demands neither user expertise nor prior knowledge of the Web site.

A framework for detection and measurement of phishing attacks

It is found that it is often possible to tell whether or not a URL belongs to a phishing attack without requiring any knowledge of the corresponding page data.

Visual-similarity-based phishing detection

This paper identifies and considers three page features that play a key role in making a phishing page look similar to a legitimate one and performs an experimental evaluation using a dataset composed of 41 real-world phishing pages, along with their corresponding legitimate targets.

On the Effectiveness of Techniques to Detect Phishing Sites

Over a period of three weeks, the effectiveness of the blacklists maintained by Google and Microsoft with 10,000 phishing URLs was tested, and the existence of page properties that can be used to identify phishing pages were explored.

Cantina: a content-based approach to detecting phishing web sites

The design, implementation, and evaluation of CANTINA, a novel, content-based approach to detecting phishing web sites, based on the TF-IDF information retrieval algorithm, are presented.