Web page classification without the web page

@inproceedings{Kan2004WebPC,
  title={Web page classification without the web page},
  author={Min-Yen Kan},
  booktitle={WWW},
  year={2004}
}
Uniform resource locators (URLs), which mark the address of a resource on the World Wide Web, are often human-readable and can hint at the category of the resource. This paper explores the use of URLs for webpage categorization via a two-phase pipeline of word segmentation/expansion and classification. We quantify its performance against document-based methods, which require the retrieval of the source document. 
Highly Cited
This paper has 73 citations. REVIEW CITATIONS

Citations

Publications citing this paper.
Showing 1-10 of 45 extracted citations

73 Citations

01020'07'10'13'16
Citations per Year
Semantic Scholar estimates that this publication has 73 citations based on the available data.

See our FAQ for additional information.

Similar Papers

Loading similar papers…