Incorporating site-level knowledge to extract structured data from web forums

  title={Incorporating site-level knowledge to extract structured data from web forums},
  author={Jiang-Ming Yang and Rui Cai and Yida Wang and Jun Zhu and Lei Zhang and Wei-Ying Ma},
Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to both complex page layout designs and unrestricted user created posts. In this paper, we study the problem of structured data extraction from various web forum sites. Our target is to find a solution as general as possible to extract structured data, such as post title, post author, post time, and post content from any… CONTINUE READING
Highly Cited
This paper has 64 citations. REVIEW CITATIONS

10 Figures & Tables



Citations per Year

65 Citations

Semantic Scholar estimates that this publication has 65 citations based on the available data.

See our FAQ for additional information.