A Unifying Approach to HTML Wrapper Representation and Learning

@inproceedings{Grieser2000AUA,
  title={A Unifying Approach to HTML Wrapper Representation and Learning},
  author={Gunter Grieser and Klaus P. Jantke and Steffen Lange and Bernd Thomas},
  booktitle={Discovery Science},
  year={2000}
}
The number, the size, and the dynamics of Internet information sources bears abundant evidence of the need for automation in information extraction. This calls for representation formalisms that match the World Wide Web reality and for learning approaches and learnability results that apply to these formalisms. The concept of elementary formal systems is appropriately generalized to allow for the representation of wrapper classes which are relevant to the description of Internet sources in… CONTINUE READING
BETA

Citations

Publications citing this paper.
SHOWING 1-10 OF 19 CITATIONS

Syntactic Folding and its Application to the Information Extraction from Web Pages

  • FLAIRS Conference
  • 2001
VIEW 8 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

A Survey on Region Extractors from Web Documents

  • IEEE Transactions on Knowledge and Data Engineering
  • 2013

A solution for invalid source link in anti-plagiarism system based on web archive

  • 2010 International Conference on Intelligent Control and Information Processing
  • 2010
VIEW 1 EXCERPT
CITES BACKGROUND

Knowledge Federation over the Web Based on Meme Media Technologies

  • Federation over the Web
  • 2005
VIEW 1 EXCERPT
CITES BACKGROUND

References

Publications referenced by this paper.
SHOWING 1-10 OF 17 REFERENCES

`Language identi cation in the limit

M. E. Gold
  • Information and Control,
  • 1967
VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL

Foundations of logic programming

VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL

Learning Elementary Formal Systems

VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL

`Anti-uni cation based learning of T-Wrappers for information extrac tion

B. Thomas
  • Proc. of AAAI Workshop on Machine Learning for IE,
  • 1999
VIEW 3 EXCERPTS

Similar Papers