• Publications
  • Influence
S2ORC: The Semantic Scholar Open Research Corpus
TLDR
In S2ORC, a large corpus of 81.1M English-language academic papers spanning many academic disciplines is introduced, which is expected to facilitate research and development of tools and tasks for text mining over academic text. Expand
GORC: A large contextual citation graph of academic papers
We introduce the Semantic Scholar Graph of References in Context (GORC),1 a large contextual citation graph of 81.1M academic publications, including parsed full text for 8.1M open access papers,Expand
PySBD: Pragmatic Sentence Boundary Disambiguation
TLDR
This work adapts the Golden Rules Set (a language specific set of sentence boundary exemplars) originally implemented as a ruby gem pragmatic segmenter to Python, ported to Python with additional improvements and functionality. Expand
PAWLS: PDF Annotation With Labels and Structure
TLDR
This paper presents PDF Annotation with Labels and Structure (PAWLS), a new annotation tool designed specifically for the PDF document format, particularly suited for mixed-mode annotation and scenarios in which annotators require extended context to annotate accurately. Expand