• Publications
  • Influence
Evaluating Models’ Local Decision Boundaries via Contrast Sets
TLDR
A more rigorous annotation paradigm for NLP that helps to close systematic gaps in the test data, and recommends that the dataset authors manually perturb the test instances in small but meaningful ways that (typically) change the gold label, creating contrast sets. Expand
Evaluating NLP Models via Contrast Sets
TLDR
A new annotation paradigm for NLP is proposed that helps to close systematic gaps in the test data, and it is recommended that after a dataset is constructed, the dataset authors manually perturb the test instances in small but meaningful ways that change the gold label, creating contrast sets. Expand
Efficient Passage Retrieval with Hashing for Open-domain Question Answering
TLDR
BPR is a memory-efficient neural retrieval model that integrates a learning-to-hash technique into the state-of-the-art Dense Passage Retriever (DPR) to represent the passage index using compact binary codes rather than continuous vectors. Expand
MultiModalQA: Complex Question Answering over Text, Tables and Images
TLDR
This paper creates MMQA, a challenging question answering dataset that requires joint reasoning over text, tables and images, and defines a formal language that allows it to take questions that can be answered from a single modality, and combine them to generate cross-modal questions. Expand
Natural Instructions: Benchmarking Generalization to New Tasks from Natural Language Instructions
TLDR
This work uses the existing NLP datasets and the instructions used to crowdsource them to create NATURALINSTRUCTIONS, a dataset of instructions and task-specific input/output data that indicates that the existing models indeed benefit from instructions and hence, show improved generalization to new tasks. Expand
FaVIQ: FAct Verification from Information-seeking Questions
TLDR
This paper constructs a challenging, realistic, and largescale fact verification dataset called FAVIQ, using information-seeking questions posed by real users who do not know how to answer, which will serve as a challenging benchmark for natural language understanding and support future progress in professional fact checking. Expand
Noisy Channel Language Model Prompting for Few-Shot Text Classification
TLDR
A noisy channel approach for language model prompting in few-shot text classification by using channel models for recently proposed few- shot learning methods with no or very limited updates to the language model parameters, via either in-context demonstration or prompt tuning. Expand
Probing Across Time: What Does RoBERTa Know and When?
TLDR
It is believed that probing-across-time analyses can help researchers understand the complex, intermingled learning that these models undergo and guide us toward more efficient approaches that accomplish necessary learning faster. Expand
GooAQ: Open Question Answering with Diverse Answer Types
TLDR
GOOAQ is presented, a large-scale dataset with a variety of answer types, containing both textual answers (short and long) as well as more structured ones such as collections, and is released to facilitate further research on improving QA with diverse response types. Expand
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
TLDR
This work introduces NATURALINSTRUCTIONS, a dataset of 61 distinct tasks, their human-authored instructions and 193k task instances, and adopts generative pre-trained language models to encode task-specific instructions along with input and generate task output. Expand
...
1
2
...