Knowledge Discovery from Semistructured Texts

@inproceedings{Sakamoto2002KnowledgeDF,
  title={Knowledge Discovery from Semistructured Texts},
  author={Hiroshi Sakamoto and Hiroki Arimura and Setsuo Arikawa},
  booktitle={Progress in Discovery Science},
  year={2002}
}
This paper surveys our recent results on the knowledge discovery from semistructured texts, which contain heterogeneous structures represented by labeled trees. The aim of our study is to extract useful information from documents on the Web. First, we present the theoretical results on learning rewriting rules between labeled trees. Second, we apply our method to the learning HTML trees in the framework of the wrapper induction. We also examine our algorithms for real world HTML documents and… CONTINUE READING

Topics from this paper.

Citations

Publications citing this paper.
SHOWING 1-9 OF 9 CITATIONS

A New Path Generalization Algorithm for HTML Wrapper Induction

  • Advances in Web Intelligence and Data Mining
  • 2006
VIEW 4 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Extracting Document Structure to Facilitate a Knowledge Base Creation for The UML Superstructure Specification

  • Fourth International Conference on Information Technology (ITNG'07)
  • 2007
VIEW 1 EXCERPT
CITES BACKGROUND

Learning Logic Wrappers for Information Extraction from the Web

  • 2005 Symposium on Applications and the Internet Workshops (SAINT 2005 Workshops)
  • 2005
VIEW 2 EXCERPTS
CITES METHODS & BACKGROUND