Looking Beyond Text: Extracting Figures, Tables and Captions from Computer Science Papers

@inproceedings{Clark2015LookingBT,
  title={Looking Beyond Text: Extracting Figures, Tables and Captions from Computer Science Papers},
  author={Christopher Andreas Clark and Santosh Kumar Divvala},
  booktitle={AAAI Workshop: Scholarly Big Data},
  year={2015}
}
Identifying and extracting figures and tables along with their captions from scholarly articles is important both as a way of providing tools for article summarization, and as part of larger systems that seek to gain deeper, semantic understanding of these articles. While many “off-the-shelf” tools exist that can extract embedded images from these documents, e.g. PDFBox, Poppler, etc., these tools are unable to extract tables, captions, and figures composed of vector graphics. Our proposed… CONTINUE READING
Highly Cited
This paper has 38 citations. REVIEW CITATIONS

From This Paper

Figures, tables, results, and topics from this paper.

Key Quantitative Results

  • Our algorithm achieves 96% precision at 92% recall when tested against this dataset, surpassing previous state of the art.

Citations

Publications citing this paper.
Showing 1-10 of 25 extracted citations

Convolutional Neural Networks for Figure Extraction in Historical Technical Documents

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) • 2017
View 9 Excerpts
Highly Influenced

References

Publications referenced by this paper.
Showing 1-10 of 15 references

Automatic extraction of figures from scientific publications in high-energy physics

P. A. Praczyk, J. Nogueras-Iso
Information Technology and Libraries. • 2013
View 8 Excerpts
Highly Influenced

Poppler

Poppler.
http://poppler.freedesktop.org/. Accessed: 2014-09-24. • 2014

Figure Metadata Extraction from Digital Documents

2013 12th International Conference on Document Analysis and Recognition • 2013

The power of asymmetry in binary hashing

Neyshabur
In NIPS • 2013

An Automatic System for Extracting Figures and Captions in Biomedical PDF Documents

2011 IEEE International Conference on Bioinformatics and Biomedicine • 2011

Similar Papers

Loading similar papers…