Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
- Mitchell Wortsman, Gabriel Ilharco, Ludwig Schmidt
- Computer ScienceInternational Conference on Machine Learning
- 10 March 2022
The model soup approach extends to multiple image classification and natural language processing tasks, improves out-of-distribution performance, and improves zero-shot performance on new downstream tasks.
Documenting the English Colossal Clean Crawled Corpus
- Jesse Dodge, Maarten Sap, Matt Gardner
- Computer ScienceArXiv
- 2021
This work provides some of the first documentation of the English Colossal Clean Crawled Corpus (C4), one of the largest corpora of text available, and hosts an indexed version of C4 at https://c4-search.allenai.org/, allowing anyone to search it.
Evaluating NLP Models via Contrast Sets
- Matt Gardner, Yoav Artzi, Ben Zhou
- Computer ScienceArXiv
- 6 April 2020
A new annotation paradigm for NLP is proposed that helps to close systematic gaps in the test data, and it is recommended that after a dataset is constructed, the dataset authors manually perturb the test instances in small but meaningful ways that change the gold label, creating contrast sets.
Patching open-vocabulary models by interpolating weights
- Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt
- Computer ScienceArXiv
- 10 August 2022
PAINT, a patching method that uses interpolations between the weights of a model before and after patching and the weights after on a task to be patched, is introduced, demonstrating that it is possible to expand the set of tasks on which open-vocabulary models achieve high accuracy without re-training them from scratch.
Probing Text Models for Common Ground with Visual Representations
- Gabriel Ilharco, Rowan Zellers, Ali Farhadi, Hannaneh Hajishirzi
- Computer ScienceArXiv
- 1 May 2020
It is found that representations from models trained on purely textual data, such as BERT, can be nontrivially mapped to those of a vision model, and the context surrounding objects in sentences greatly impacts performance.
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
- T. Nguyen, Gabriel Ilharco, Mitchell Wortsman, Sewoong Oh, Ludwig Schmidt
- Computer ScienceArXiv
- 10 August 2022
It is demonstrated that simply gathering a large amount of data from the web is not the most effective way to build a pre-training dataset for robust generalization, necessitating further study into dataset design.
Contrasting Contrastive Self-Supervised Representation Learning Models
- Klemen Kotar, Gabriel Ilharco, Ludwig Schmidt, Kiana Ehsani, Roozbeh Mottaghi
- Computer ScienceArXiv
- 2021
This paper analyzes contrastive approaches as one of the most successful and popular variants of self-supervised representation learning and examines over 700 training experiments including 30 encoders, 4 pre-training datasets and 20 diverse downstream tasks.
High Performance Natural Language Processing
- Gabriel Ilharco, Cesar Ilharco, Iulia Turc, Tim Dettmers, F. Ferreira, Kenton Lee
- Computer ScienceConference on Empirical Methods in Natural…
- 1 November 2020
This cutting-edge tutorial will recapitulate the state-of-the-art in natural language processing with scale in perspective, and cover a wide range of techniques for improving efficiency, including knowledge distillation, quantization, pruning, more efficient architectures, along with case studies and practical implementation tricks.
Reproducible scaling laws for contrastive language-image learning
- Mehdi Cherti, Romain Beaumont, J. Jitsev
- Computer ScienceArXiv
- 14 December 2022
It is found that the training distribution plays a key role in scaling laws as the OpenAI and OpenCLIP models exhibit different scaling behavior despite identical model architectures and similar training recipes.
Editing Models with Task Arithmetic
- Gabriel Ilharco, Marco Tulio Ribeiro, Ali Farhadi
- Computer Science, PsychologyArXiv
- 8 December 2022
This work proposes a new paradigm for steering the behavior of neural networks, centered around task vectors, and shows that task arithmetic is a simple, efficient and effective way of editing models.
...
...