Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
- Mitchell Wortsman, Gabriel Ilharco, Ludwig Schmidt
- Computer ScienceInternational Conference on Machine Learning
- 10 March 2022
The model soup approach extends to multiple image classiﬁcation and natural language processing tasks, improves out-of-distribution performance, and improves zero-shot performance on new downstream tasks.
Documenting the English Colossal Clean Crawled Corpus
This work provides some of the first documentation of the English Colossal Clean Crawled Corpus (C4), one of the largest corpora of text available, and hosts an indexed version of C4 at https://c4-search.allenai.org/, allowing anyone to search it.
Evaluating NLP Models via Contrast Sets
A new annotation paradigm for NLP is proposed that helps to close systematic gaps in the test data, and it is recommended that after a dataset is constructed, the dataset authors manually perturb the test instances in small but meaningful ways that change the gold label, creating contrast sets.
Patching open-vocabulary models by interpolating weights
PAINT, a patching method that uses interpolations between the weights of a model before and after patching and the weights after on a task to be patched, is introduced, demonstrating that it is possible to expand the set of tasks on which open-vocabulary models achieve high accuracy without re-training them from scratch.
Probing Text Models for Common Ground with Visual Representations
It is found that representations from models trained on purely textual data, such as BERT, can be nontrivially mapped to those of a vision model, and the context surrounding objects in sentences greatly impacts performance.
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
- T. Nguyen, Gabriel Ilharco, Mitchell Wortsman, Sewoong Oh, Ludwig Schmidt
- Computer ScienceArXiv
- 10 August 2022
It is demonstrated that simply gathering a large amount of data from the web is not the most effective way to build a pre-training dataset for robust generalization, necessitating further study into dataset design.
Contrasting Contrastive Self-Supervised Representation Learning Models
- Klemen Kotar, Gabriel Ilharco, Ludwig Schmidt, Kiana Ehsani, Roozbeh Mottaghi
- Computer ScienceArXiv
This paper analyzes contrastive approaches as one of the most successful and popular variants of self-supervised representation learning and examines over 700 training experiments including 30 encoders, 4 pre-training datasets and 20 diverse downstream tasks.
High Performance Natural Language Processing
- Gabriel Ilharco, Cesar Ilharco, Iulia Turc, Tim Dettmers, F. Ferreira, Kenton Lee
- Computer ScienceConference on Empirical Methods in Natural…
- 1 November 2020
This cutting-edge tutorial will recapitulate the state-of-the-art in natural language processing with scale in perspective, and cover a wide range of techniques for improving efficiency, including knowledge distillation, quantization, pruning, more efficient architectures, along with case studies and practical implementation tricks.
Reproducible scaling laws for contrastive language-image learning
It is found that the training distribution plays a key role in scaling laws as the OpenAI and OpenCLIP models exhibit diﬀerent scaling behavior despite identical model architectures and similar training recipes.
Editing Models with Task Arithmetic
This work proposes a new paradigm for steering the behavior of neural networks, centered around task vectors, and shows that task arithmetic is a simple, efﬁcient and effective way of editing models.