Web page classification using n-gram based URL features

@article{Rajalakshmi2013WebPC,
  title={Web page classification using n-gram based URL features},
  author={R. Rajalakshmi and C. Aravindan},
  journal={2013 Fifth International Conference on Advanced Computing (ICoAC)},
  year={2013},
  pages={15-21}
}
  • R. Rajalakshmi, C. Aravindan
  • Published 2013
  • Computer Science
  • 2013 Fifth International Conference on Advanced Computing (ICoAC)
  • Exponential increase in the number of web pages in the World Wide Web poses a great challenge in information filtering and also makes topic focused crawling a time consuming process in searching for relevant information. [...] Key Method In the proposed approach, character n-gram based features are extracted from URLs alone and classification is done by Support Vector Machines and Maximum Entropy Classifiers. The performance of the system was evaluated on two bench mark datasets viz., ODP with 2 million URLs…Expand Abstract
    Supervised Term Weighting Methods for URL Classification
    • 6
    • Highly Influenced
    • Open Access
    Identifying Health Domain URLs using SVM
    • 8
    • Highly Influenced
    Web image size prediction for efficient focused image crawling

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 29 REFERENCES
    Fast webpage classification using URL features
    • 225
    • Open Access
    Machine Learning Techniques for Automated Web Page Classification Using URL Features
    • 20
    Knowing a web page by the company it keeps
    • 73
    • Open Access
    Purely URL-based topic classification
    • 108
    • Highly Influential
    • Open Access
    Web page classification: Features and algorithms
    • 454
    • Open Access
    Webpage Classification based on Compound of Using HTML Features & URL Features and Features of Sibling Pages
    • 18
    • Open Access
    Naive Bayes Approach for Website Classification
    • 22
    Joint Web-Feature (JFEAT): A Novel Web Page Classification Framework
    • 14
    • Open Access