Extracting Scientific Figures with Distantly Supervised Neural Networks

@inproceedings{Siegel2018ExtractingSF,
  title={Extracting Scientific Figures with Distantly Supervised Neural Networks},
  author={Noah Siegel and Nicholas Lourie and Russell Power and Waleed Ammar},
  booktitle={JCDL},
  year={2018}
}
Non-textual components such as charts, diagrams and tables provide key information in many scientific documents, but the lack of large labeled datasets has impeded the development of data-driven methods for scientific figure extraction. In this paper, we induce high-quality training labels for the task of figure extraction in a large number of scientific documents, with no human intervention. To accomplish this we leverage the auxiliary data provided in two large web collections of scientific… CONTINUE READING
8
Twitter Mentions

Figures, Tables, Results, and Topics from this paper.

Key Quantitative Results

  • We share the resulting dataset of over 5.5 million induced labels---4,000 times larger than the previous largest figure extraction dataset---with an average precision of 96.8%, to enable the development of modern data-driven methods for this task.

Citations

Publications citing this paper.

References

Publications referenced by this paper.
SHOWING 1-2 OF 2 REFERENCES

PDFFigures 2.0: Mining figures from research papers

  • 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)
  • 2016
VIEW 13 EXCERPTS
HIGHLY INFLUENTIAL

Deep Residual Learning for Image Recognition

  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL