SciBERT: A Pretrained Language Model for Scientific Text

@inproceedings{Beltagy2019SciBERTAP,
  title={SciBERT: A Pretrained Language Model for Scientific Text},
  author={Iz Beltagy and Kyle Lo and Arman Cohan},
  booktitle={IJCNLP 2019},
  year={2019}
}
Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin et al., 2018) to address the lack of high-quality, large-scale labeled scientific data. SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks. We evaluate on a suite of tasks including sequence tagging, sentence… CONTINUE READING

Citations

Publications citing this paper.

Pretrained Language Models for Sequential Sentence Classification

Arman Cohan, Iz Beltagy, Daniel King, Bhavana Dalvi, Daniel S. Weld
  • IJCNLP 2019
  • 2019
VIEW 1 EXCERPT
CITES METHODS