• Publications
  • Influence
RoBERTa: A Robustly Optimized BERT Pretraining Approach
TLDR
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD. Expand
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
TLDR
It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers. Expand
SpanBERT: Improving Pre-training by Representing and Predicting Spans
TLDR
The approach extends BERT by masking contiguous random spans, rather than random tokens, and training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. Expand
BERT for Coreference Resolution: Baselines and Analysis
TLDR
A qualitative analysis of model predictions indicates that, compared to ELMo and Bert-base, BERT-large is particularly better at distinguishing between related but distinct entities, but that there is still room for improvement in modeling document-level context, conversations, and mention paraphrasing. Expand
pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
TLDR
P pairwise embeddings of word pairs are computed as a compositional function of each word’s representation, which is learned by maximizing the pointwise mutual information (PMI) with the contexts in which the the two words co-occur. Expand
An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction
TLDR
This paper shows that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective, and derives a learning objective that allows direct control of mask sparsity levels through a tunable sparse prior. Expand
Streamlining Cross-Document Coreference Resolution: Evaluation and Modeling
TLDR
This work proposes a pragmatic evaluation methodology which assumes access to only raw text -- rather than assuming gold mentions, disregards singleton prediction, and addresses typical targeted settings in CD coreference resolution. Expand
Contextualized Representations Using Textual Encyclopedic Knowledge
TLDR
It is shown that integrating background knowledge from text is effective for tasks focusing on factual reasoning and allows direct reuse of powerful pretrained BERT-style encoders and knowledge integration can be further improved with suitable pretraining via a self-supervised masked language model objective over words in background-augmented input text. Expand
Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries
TLDR
A novel technique to segment a telegraphic query and assign a coarse-grained purpose to each segment: a base entity e1, a relation type r, a target entity type t2, and contextual words s is proposed. Expand
Cross-document Coreference Resolution over Predicted Mentions
TLDR
This work introduces the first end-to-end model for CD coreference resolution from raw text, which extends the prominent model for withindocument coreference to the CD setting and achieves competitive results for event and entity coreferenceresolution on gold mentions. Expand
...
1
2
...