Learning Robust Web Wrappers

@inproceedings{Fazzinga2005LearningRW,
  title={Learning Robust Web Wrappers},
  author={Bettina Fazzinga and Sergio Flesca and Andrea Tagarelli},
  booktitle={DEXA},
  year={2005}
}
A main challenge in wrapping web data is to make wrappers robust w.r.t. variations in HTML sources, reducing human effort as much as possible. In this paper we develop a new approach to speed up the specification of robust wrappers, allowing the wrapper designer to not care about detailed definition of extraction rules. The key-idea is to enable a schema-based wrapping system to automatically generalize an original wrapper w.r.t. a set of example HTML documents. To accomplish this objective, we… CONTINUE READING

Figures, Tables, and Topics from this paper.

Similar Papers

Citations

Publications citing this paper.