• Publications
  • Influence
Natural Questions: A Benchmark for Question Answering Research
TLDR
The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature.
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
TLDR
It is found that transferring from entailment data is more effective than transferring from paraphrase or extractive QA data, and that it, surprisingly, continues to be very beneficial even when starting from massive pre-trained language models such as BERT.
Matching the Blanks: Distributional Similarity for Relation Learning
TLDR
This paper builds on extensions of Harris’ distributional hypothesis to relations, as well as recent advances in learning text representations (specifically, BERT), to build task agnostic relation representations solely from entity-linked text.
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
TLDR
A quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena that would not be found in English-only corpora are presented.
Scaling Semantic Parsers with On-the-Fly Ontology Matching
TLDR
A new semantic parsing approach that learns to resolve ontological mismatches, which is learned from question-answer pairs, uses a probabilistic CCG to build linguistically motivated logicalform meaning representations, and includes an ontology matching model that adapts the output logical forms for each target ontology.
Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification
TLDR
This paper uses higher-order unification to define a hypothesis space containing all grammars consistent with the training data, and develops an online learning algorithm that efficiently searches this space while simultaneously estimating the parameters of a log-linear parsing model.
Lexical Generalization in CCG Grammar Induction for Semantic Parsing
TLDR
An algorithm for learning factored CCG lexicons, along with a probabilistic parse-selection model, which includes both lexemes to model word meaning and templates to model systematic variation in word usage are presented.
Learning Recurrent Span Representations for Extractive Question Answering
TLDR
This paper presents a novel model architecture that efficiently builds fixed length representations of all spans in the evidence document with a recurrent network, and shows that scoring explicit span representations significantly improves performance over other approaches that factor the prediction into separate predictions about words or start and end markers.
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index
TLDR
This paper introduces query-agnostic indexable representations of document phrases that can drastically speed up open-domain QA, and introduces dense-sparse phrase encoding, which effectively captures syntactic, semantic, and lexical information of the phrases and eliminates the pipeline filtering of context documents.
...
...