Corpus ID: 16285667

Looking Beyond Text: Extracting Figures, Tables and Captions from Computer Science Papers

  title={Looking Beyond Text: Extracting Figures, Tables and Captions from Computer Science Papers},
  author={Christopher Clark and S. Divvala},
  booktitle={AAAI Workshop: Scholarly Big Data},
  • Christopher Clark, S. Divvala
  • Published in
    AAAI Workshop: Scholarly Big…
  • Computer Science
  • Identifying and extracting figures and tables along with their captions from scholarly articles is important both as a way of providing tools for article summarization, and as part of larger systems that seek to gain deeper, semantic understanding of these articles. [...] Key Method This method can extract a wide variety of figures because it does not make strong assumptions about the format of the figures embedded in the document, as long as they can be differentiated from the main article's text.Expand Abstract

    Figures, Tables, and Topics from this paper.

    PDFFigures 2.0: Mining figures from research papers
    • 43
    • PDF
    Extracting Scientific Figures with Distantly Supervised Neural Networks
    • 24
    • PDF
    SideNoter: Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation
    • 11
    • PDF
    Scalable algorithms for scholarly figure mining and semantics
    • 8
    • PDF


    Publications referenced by this paper.
    An Overview of the Tesseract OCR Engine
    • 1,011
    • PDF
    Figure Metadata Extraction from Digital Documents
    • 35
    • PDF
    Yale Image Finder (YIF): a new search engine for retrieving biomedical images
    • 102
    • PDF
    A survey of table recognition
    • 128
    On methods and tools of table detection, extraction and annotation in PDF documents
    • 28
    CiteSeerX: AI in a Digital Library Search Engine
    • 62
    • PDF