PAWLS: PDF Annotation With Labels and Structure

  title={PAWLS: PDF Annotation With Labels and Structure},
  author={Mark Neumann and Zejiang Shen and Sam Skjonsberg},
Adobe’s Portable Document Format (PDF) is a popular way of distributing view-only documents with a rich visual markup. This presents a challenge to NLP practitioners who wish to use the information contained within PDF documents for training models or data analysis, because annotating these documents is difficult. In this paper, we present PDF Annotation with Labels and Structure (PAWLS), a new annotation tool designed specifically for the PDF document format. PAWLS is particularly suited for… 

