On the complexity of schema inference from web pages in the presence of nullable data attributes

@inproceedings{Yang2003OnTC,
  title={On the complexity of schema inference from web pages in the presence of nullable data attributes},
  author={Guizhen Yang and I. V. Ramakrishnan and Michael Kifer},
  booktitle={CIKM},
  year={2003}
}
An increasingly large number of Web pages are machine-generated by filling in templates with data stored in backend databases. These templates can be viewed as the implicit schemas of those Web pages. The ability to infer the implicit schema from a collection of Web pages is important for scalable data extraction, since the inferred schema can be used to automatically identify schema attributes that are "encoded" in Web pages.However, the task of inferring a "good" schema is complicated due to… CONTINUE READING

Figures and Topics from this paper.

Citations

Publications citing this paper.
SHOWING 1-10 OF 17 CITATIONS

References

Publications referenced by this paper.