• Publications
  • Influence
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
TLDR
This work investigates how the performance of the best-found model varies as a function of the number of fine-tuning trials, and examines two factors influenced by the choice of random seed: weight initialization and training data order. Expand
Evaluating Models’ Local Decision Boundaries via Contrast Sets
TLDR
A more rigorous annotation paradigm for NLP that helps to close systematic gaps in the test data, and recommends that the dataset authors manually perturb the test instances in small but meaningful ways that (typically) change the gold label, creating contrast sets. Expand
General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping
TLDR
This work defines the normalized Dynamic Time Warping (nDTW) metric, which is naturally sensitive to the order of the nodes composing each path, is suited for both continuous and graph-based evaluations, and can be efficiently calculated, and defines SDTW, which constrains nDTW to only successful paths. Expand
Large-scale representation learning from visually grounded untranscribed speech
TLDR
A scalable method to automatically generate diverse audio for image captioning datasets via a dual encoder that learns to align latent representations from both modalities is described and it is shown that a masked margin softmax loss for such models is superior to the standard triplet loss. Expand
Finetuning Pretrained Transformers into RNNs
TLDR
This work proposes a swap-then-finetune procedure: in an off-the-shelf pretrained transformer, the softmax attention is replaced with its linear-complexity recurrent alternative and then finetune, which provides an improved tradeoff between efficiency and accuracy over the standard transformer and other recurrent variants. Expand
Documenting the English Colossal Clean Crawled Corpus
TLDR
This work provides some of the first documentation of the English Colossal Clean Crawled Corpus (C4), one of the largest corpora of text available, and hosts an indexed version of C4 at https://c4-search.allenai.org/, allowing anyone to search it. Expand
Evaluating NLP Models via Contrast Sets
TLDR
A new annotation paradigm for NLP is proposed that helps to close systematic gaps in the test data, and it is recommended that after a dataset is constructed, the dataset authors manually perturb the test instances in small but meaningful ways that change the gold label, creating contrast sets. Expand
MultiModalQA: Complex Question Answering over Text, Tables and Images
TLDR
This paper creates MMQA, a challenging question answering dataset that requires joint reasoning over text, tables and images, and defines a formal language that allows it to take questions that can be answered from a single modality, and combine them to generate cross-modal questions. Expand
Probing Text Models for Common Ground with Visual Representations
TLDR
It is found that representations from models trained on purely textual data, such as BERT, can be nontrivially mapped to those of a vision model, and the context surrounding objects in sentences greatly impacts performance. Expand
Contrasting Contrastive Self-Supervised Representation Learning Models
TLDR
This paper analyzes contrastive approaches as one of the most successful and popular variants of self-supervised representation learning and examines over 700 training experiments including 30 encoders, 4 pre-training datasets and 20 diverse downstream tasks. Expand
...
1
2
...