Title extraction from bodies of HTML documents and its application to web page retrieval

@inproceedings{Hu2005TitleEF,
  title={Title extraction from bodies of HTML documents and its application to web page retrieval},
  author={Yunhua Hu and Guomao Xin and Ruihua Song and Guoping Hu and Shuming Shi and Yunbo Cao and Hang Li},
  booktitle={SIGIR},
  year={2005}
}
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, in reality HTML titles are often bogus. It is desirable to conduct automatic extraction of titles from the bodies of HTML documents. This is an issue which does not seem to have been investigated previously. In this paper, we take a supervised machine learning approach to address the problem. We propose a specification… CONTINUE READING
Highly Cited
This paper has 78 citations. REVIEW CITATIONS
38 Citations
2 References
Similar Papers

Citations

Publications citing this paper.
Showing 1-10 of 38 extracted citations

78 Citations

01020'08'11'14'17
Citations per Year
Semantic Scholar estimates that this publication has 78 citations based on the available data.

See our FAQ for additional information.

Similar Papers

Loading similar papers…