Hierarchical topic segmentation of websites

  title={Hierarchical topic segmentation of websites},
  author={Ravi Kumar and Kunal Punera and Andrew Tomkins},
In this paper, we consider the problem of identifying and segmenting topically cohesive regions in the URL tree of a large website. Each page of the website is assumed to have a topic label or a distribution on topic labels generated using a standard classifier. We develop a set of cost measures characterizing the benefit accrued by introducing a segmentation of the site based on the topic labels. We propose a general framework to use these measures for describing the quality of a segmentation… CONTINUE READING
Highly Cited
This paper has 22 citations. REVIEW CITATIONS