• Publications
  • Influence
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network
TLDR
A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models. Expand
Observed versus latent features for knowledge base and text inference
TLDR
It is shown that the observed features model is most effective at capturing the information present for entity pairs with textual relations, and a combination of the two combines the strengths of both model types. Expand
Representing Text for Joint Embedding of Text and Knowledge Bases
TLDR
A model is proposed that captures the compositional structure of textual relations, and jointly optimizes entity, knowledge base, and textual relation representations, and significantly improves performance over a model that does not share parameters among textual relations with common sub-structure. Expand
Natural Questions: A Benchmark for Question Answering Research
TLDR
The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature. Expand
Latent Retrieval for Weakly Supervised Open Domain Question Answering
TLDR
It is shown for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system, and outperforming BM25 by up to 19 points in exact match. Expand
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
TLDR
It is found that transferring from entailment data is more effective than transferring from paraphrase or extractive QA data, and that it, surprisingly, continues to be very beneficial even when starting from massive pre-trained language models such as BERT. Expand
Cross-Sentence N-ary Relation Extraction with Graph LSTMs
TLDR
A general relation extraction framework based on graph long short-term memory networks (graph LSTMs) that can be easily extended to cross-sentence n-ary relation extraction is explored, demonstrating its effectiveness with both conventional supervised learning and distant supervision. Expand
Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment
TLDR
This work advances the state of the art in parallel sentence extraction by modeling the document level alignment, motivated by the observation that parallel sentence pairs are often found in close proximity. Expand
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
TLDR
It is shown that pre-training remains important in the context of smaller architectures, and fine-tuning pre-trained compact models can be competitive to more elaborate methods proposed in concurrent work. Expand
...
1
2
3
4
5
...