TEGRA: Table Extraction by Global Record Alignment

@inproceedings{Chu2015TEGRATE,
  title={TEGRA: Table Extraction by Global Record Alignment},
  author={Xu Chu and Yeye He and Kaushik Chakrabarti and Kris Ganjam},
  booktitle={SIGMOD Conference},
  year={2015}
}
It is well known today that pages on the Web contain a large number of content-rich relational tables. Such tables have been systematically extracted in a number of efforts to empower important applications such as table search and schema discovery. However, a significant fraction of relational tables are not embedded in the standard HTML table tags, and are thus difficult to extract. In particular, a large number of relational tables are known to be in a ``list'' form, which contains a list of… CONTINUE READING

Similar Papers

Figures, Tables, Results, and Topics from this paper.

Key Quantitative Results

  • Our approach considerably outperforms the state-of-the-art approaches in terms of quality, achieving over 90% F-measure across many cases.

Citations

Publications citing this paper.
SHOWING 1-10 OF 17 CITATIONS

On Extracting Data from HTML Tables

VIEW 4 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

Joint repairs for web wrappers

  • 2016 IEEE 32nd International Conference on Data Engineering (ICDE)
  • 2016
VIEW 9 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Multi-Hypothesis Parsing of Tabular Data in Comma-Separated Values (CSV) Files

VIEW 5 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Multi-Hypothesis CSV Parsing

VIEW 3 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Table understanding in structured documents

Martin Holevcek, Anton'in Hoskovec, Petr Baudivs, Pavel Klinger
  • 2019
VIEW 1 EXCERPT