SpanBERT: Improving Pre-training by Representing and Predicting Spans

@article{Joshi2019SpanBERTIP,
  title={SpanBERT: Improving Pre-training by Representing and Predicting Spans},
  author={Mandar Joshi and Danqi Chen and Yinhan Liu and Daniel S. Weld and Luke S. Zettlemoyer and Omer Levy},
  journal={ArXiv},
  year={2019},
  volume={abs/1907.10529}
}
We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as… CONTINUE READING

Figures, Tables, Results, and Topics from this paper.

Key Quantitative Results

  • We also observe similar gains on five additional extractive question answering benchmarks (NewsQA, TriviaQA, SearchQA, HotpotQA, and Natural Questions).1 SpanBERT also arrives at a new state of the art on the challenging CoNLL-2012 (“OntoNotes”) shared task for document-level coreference resolution, where we reach 79.6% F1, exceeding the previous top model by 6.6% absolute.
  • Finally, SpanBERT significantly improves on top of that, achieving a new state of the art of 79.6% F1 (previous best result is 73.0%).

Citations

Publications citing this paper.
SHOWING 1-8 OF 8 CITATIONS

BERT for Coreference Resolution: Baselines and Analysis

Mandar Joshi, Omer Levy, Daniel S. Weld, Luke Zettlemoyer
  • IJCNLP 2019
  • 2019
VIEW 4 EXCERPTS
CITES METHODS & BACKGROUND

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Zhenzhong Lan, Mingda Chen, +3 authors Radu Soricut
  • ArXiv
  • 2019

Cross-Lingual Natural Language Generation via Pre-Training

Zewen Chi, Li Dong, +3 authors Heyan Huang
  • ArXiv
  • 2019
VIEW 2 EXCERPTS
CITES BACKGROUND

Pretrained AI Models: Performativity, Mobility, and Change

Lav R. Varshney, Nitish Shirish Keskar, Richard Socher
  • ArXiv
  • 2019
VIEW 2 EXCERPTS
CITES BACKGROUND

RoBERTa: A Robustly Optimized BERT Pretraining Approach

  • ArXiv
  • 2019
VIEW 3 EXCERPTS
CITES BACKGROUND & METHODS

Semantics-aware BERT for Language Understanding

Zhuosheng Zhang, Yuwei Wu, +4 authors Xiang Zhou
  • ArXiv
  • 2019
VIEW 1 EXCERPT
CITES BACKGROUND

TinyBERT: Distilling BERT for Natural Language Understanding

Xiaoqi Jiao, Yichun Yin, +5 authors Qun Liu
  • ArXiv
  • 2019

References

Publications referenced by this paper.
SHOWING 1-10 OF 52 REFERENCES

ERNIE: Enhanced Representation through Knowledge Integration

  • ArXiv
  • 2019
VIEW 6 EXCERPTS
HIGHLY INFLUENTIAL

tomatically constructing a corpus of sentential paraphrases

Nan Yang, Wenhui Wang, +5 authors Hsiao-Wuen Hon
  • 2019
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL