Learning to Extract Hierarchical Information from Semi-structured Documents

@inproceedings{Lam2000LearningTE,
  title={Learning to Extract Hierarchical Information from Semi-structured Documents},
  author={Wai Lam and Wai-Yip Lin},
  booktitle={CIKM},
  year={2000}
}
Existing wrapper learning methods need varying form of assumptions or information about the document structure. Many of them can only handle documents with simple structures. T o handle a richer set of semi-structured documents and minimize the burden of user, we develop a new method, known as HISER (HIerarchical record Structure and Extraction Rule learning). Our HISER approach employs a tw ostage learning task, namely, hierarc hical record structure learning and extraction rule learning. In… CONTINUE READING
Highly Cited
This paper has 20 citations. REVIEW CITATIONS