• Publications
  • Influence
SciBERT: A Pretrained Language Model for Scientific Text
TLDR
We release SciBERT, a pretrained language model based on BERT (Devlin et al., 2018) to address the lack of high-quality, large-scale labeled scientific data. Expand
  • 343
  • 75
  • PDF
SciBERT: Pretrained Contextualized Embeddings for Scientific Text
TLDR
We release SciBERT, a pretrained contextualized embedding model based on BERT (Devlin et al., 2018) to address the lack of high-quality, large-scale labeled scientific data. Expand
  • 134
  • 38
  • PDF
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
TLDR
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks, showing that a second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains, under both high- and low-resource settings. Expand
  • 155
  • 30
  • PDF
CORD-19: The Covid-19 Open Research Dataset
TLDR
We describe the mechanics of dataset construction, highlighting challenges and key design decisions, provide an overview of how CORD-19 has been used, and preview tools and upcoming shared tasks built around the dataset. Expand
  • 171
  • 27
  • PDF
Construction of the Literature Graph in Semantic Scholar
TLDR
We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. Expand
  • 119
  • 15
  • PDF
TREC-COVID: rationale and structure of an information retrieval shared task for COVID-19
TLDR
TREC-COVID is an information retrieval (IR) shared task initiated to support clinicians and clinical research during the COVID-19 pandemic. Expand
  • 35
  • 8
  • PDF
S2ORC: The Semantic Scholar Open Research Corpus
TLDR
We introduce S2ORC, a large corpus of 81.1M English-language academic papers spanning many academic disciplines. Expand
  • 42
  • 7
  • PDF
TREC-COVID: Constructing a Pandemic Information Retrieval Test Collection
TLDR
TREC-COVID is a community evaluation designed to build a test collection that captures the information needs of biomedical researchers using the scientific literature during a pandemic. Expand
  • 34
  • 7
  • PDF
Fact or Fiction: Verifying Scientific Claims
TLDR
We introduce scientific claim verification, a new task to select abstracts from the research literature containing evidence that supports or refutes a given scientific claim, and to identify rationales justifying each decision. Expand
  • 20
  • 4
  • PDF