Automatic web spreadsheet data extraction

@inproceedings{Chen2013AutomaticWS,
  title={Automatic web spreadsheet data extraction},
  author={Zhe Chen and Michael J. Cafarella},
  booktitle={SSW@VLDB},
  year={2013}
}
Spreadsheets contain a huge amount of high-value data but do not observe a standard data model and thus are difficult to integrate. A large number of data integration tools exist, but they generally can only work on relational data. Existing systems for extracting relational data from spreadsheets are too labor intensive to support ad-hoc integration tasks, in which the correct extraction target is only learned during the course of user interaction. This paper introduces a system that… CONTINUE READING

From This Paper

Figures, tables, results, and topics from this paper.

Key Quantitative Results

  • When compared to standard techniques for spreadsheet data extraction on a set of 100 random Web spreadsheets, the system reduces the amount of human labor by 72% to 92%.

Citations

Publications citing this paper.
SHOWING 1-10 OF 39 CITATIONS

Similar Papers

Loading similar papers…