PDFFigures 2.0: Mining figures from research papers

  title={PDFFigures 2.0: Mining figures from research papers},
  author={Christopher Andreas Clark and Santosh Kumar Divvala},
  journal={2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)},
Figures and tables are key sources of information in many scholarly documents. However, current academic search engines do not make use of figures and tables when semantically parsing documents or presenting document summaries to users. To facilitate these applications we develop an algorithm that extracts figures, tables, and captions from documents called "PDFFigures 2.0." Our proposed approach analyzes the structure of individual pages by detecting captions, graphical elements, and chunks of… CONTINUE READING
Highly Cited
This paper has 18 citations. REVIEW CITATIONS

From This Paper

Results and topics from this paper.

Key Quantitative Results

  • Our algorithm achieves impressive results (94% precision at 90% recall) on this dataset surpassing previous state of the art.


Publications citing this paper.
Showing 1-10 of 13 extracted citations

Extracting and Retargeting Color Mappings from Bitmap Images of Visualizations

IEEE Transactions on Visualization and Computer Graphics • 2018
View 1 Excerpt

A Data Driven Approach for Compound Figure Separation Using Convolutional Neural Networks

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) • 2017
View 2 Excerpts

Big Scholarly Data: A Survey

IEEE Transactions on Big Data • 2017
View 1 Excerpt


Publications referenced by this paper.
Showing 1-2 of 2 references

Automatic Extraction of Figures from Scientific Publications in High-Energy Physics

P. A. Praczyk, J. Nogueras-Iso
In Information Technology and Libraries, • 2013
View 7 Excerpts
Highly Influenced

Similar Papers

Loading similar papers…