Corpus ID: 220363944

TableBank: A Benchmark Dataset for Table Detection and Recognition

@inproceedings{Li2019TableBankAB,
  title={TableBank: A Benchmark Dataset for Table Detection and Recognition},
  author={M. Li and Lei Cui and Shaohan Huang and Furu Wei and M. Zhou and Zhoujun Li},
  year={2019}
}
  • M. Li, Lei Cui, +3 authors Zhoujun Li
  • Published 2019
  • Computer Science
  • We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet. Existing research for image-based table detection and recognition usually fine-tunes pre-trained models on out-of-domain data with a few thousand human-labeled examples, which is difficult to generalize on real-world applications. With TableBank that contains 417K high quality labeled tables, we build several strong baselines using… CONTINUE READING

    Topics from this paper.