Title extraction from bodies of HTML documents and its application to web page retrieval

@inproceedings{Hu2005TitleEF,
  title={Title extraction from bodies of HTML documents and its application to web page retrieval},
  author={Yunhua Hu and Guomao Xin and Ruihua Song and Guoping Hu and Shuming Shi and Yunbo Cao and Hang Li},
  booktitle={SIGIR},
  year={2005}
}
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, in reality HTML titles are often bogus. It is desirable to conduct automatic extraction of titles from the bodies of HTML documents. This is an issue which does not seem to have been investigated previously. In this paper, we take a supervised machine learning approach to address the problem. We propose a specification… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 46 CITATIONS

Using linguistic features to automatically extract web page title

VIEW 36 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

The Determination of Cluster Number at k-Mean Using Elbow Method and Purity Evaluation on Headline News

  • 2018 International Seminar on Application for Technology of Information and Communication
  • 2018
VIEW 1 EXCERPT
CITES BACKGROUND

References

Publications referenced by this paper.

Similar Papers

Loading similar papers…