Topical Host Reputation for Lightweight Url Classification

@inproceedings{Kolcz2010TopicalHR,
  title={Topical Host Reputation for Lightweight Url Classification},
  author={Aleksander Kolcz and Geoff Hulten and Jakub Szymanski},
  year={2010}
}
Classification of urls into topical categories is an important task of data minining and information filtering. In many applications the task needs to be performed with minimal information, which usually means just the url itself. While for some topics using the url information is surprisingly effective, there is still a substantial loss in accuracy when compared to basing the classification on full web page content. In this work we stipulate that the basic url-based approach can be… CONTINUE READING

Citations

Publications citing this paper.

Recall estimation for rare topic retrieval from large corpuses

  • 2014 IEEE International Conference on Big Data (Big Data)
  • 2014
VIEW 1 EXCERPT
CITES BACKGROUND

References

Publications referenced by this paper.
SHOWING 1-10 OF 13 REFERENCES

Web page classification without the web page

VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

Link-based Classification

VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL

Support vector machines classification with very large scale taxonomy

T.-Y. Liua, Y. Yang, H. Wan
  • SIGKDD Explorations, 7(1):36—43
  • 2005
VIEW 1 EXCERPT

PageRank

C. Ding, X. He, P. Husbands, H. Zha, H. D. Simon
  • HITS and a unified framework for link analysis. In SIGIR ’02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 353—354, New York, NY, USA
  • 2002
VIEW 1 EXCERPT

Similar Papers

Loading similar papers…