Extracting Scientific Figures with Distantly Supervised Neural Networks

@article{Siegel2018ExtractingSF,
  title={Extracting Scientific Figures with Distantly Supervised Neural Networks},
  author={Noah Siegel and Nicholas Lourie and R. Power and Waleed Ammar},
  journal={Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries},
  year={2018}
}
Non-textual components such as charts, diagrams and tables provide key information in many scientific documents, but the lack of large labeled datasets has impeded the development of data-driven methods for scientific figure extraction. [...] Key Method We share the resulting dataset of over 5.5 million induced labels---4,000 times larger than the previous largest figure extraction dataset---with an average precision of 96.8%, to enable the development of modern data-driven methods for this task.Expand
Document Domain Randomization for Deep Learning Document Layout Extraction
Robust PDF Document Conversion Using Recurrent Neural Networks
TabLeX: A Benchmark Dataset for Structure and Content Information Extraction from Scientific Tables
...
1
2
3
4
5
...

References

SHOWING 1-2 OF 2 REFERENCES
PDFFigures 2.0: Mining figures from research papers
Deep Residual Learning for Image Recognition