What’s Hidden in a Randomly Weighted Neural Network?
- Vivek Ramanujan, Mitchell Wortsman, Aniruddha Kembhavi, Ali Farhadi, Mohammad Rastegari
- Computer ScienceComputer Vision and Pattern Recognition
- 29 November 2019
It is empirically show that as randomly weighted neural networks with fixed weights grow wider and deeper, an ``untrained subnetwork" approaches a network with learned weights in accuracy.
Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning
- Mitchell Wortsman, Kiana Ehsani, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi
- Computer ScienceComputer Vision and Pattern Recognition
- 3 December 2018
A self-adaptive visual navigation method (SAVN) which learns to adapt to new environments without any explicit supervision which shows major improvements in both success rate and SPL for visual navigation in novel scenes.
Robust fine-tuning of zero-shot models
- Mitchell Wortsman, Gabriel Ilharco, Ludwig Schmidt
- Computer ScienceComputer Vision and Pattern Recognition
- 4 September 2021
This work introduces a simple and effective method for improving robustness whilefine-tuning: ensembling the weights of the zero-shot and fine-tuned models (WiSE-FT), providing large accuracy improvements under distribution shift, while preserving high accuracy on the target distribution.
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
- Mitchell Wortsman, Gabriel Ilharco, Ludwig Schmidt
- Computer ScienceInternational Conference on Machine Learning
- 10 March 2022
The model soup approach extends to multiple image classification and natural language processing tasks, improves out-of-distribution performance, and improves zero-shot performance on new downstream tasks.
Soft Threshold Weight Reparameterization for Learnable Sparsity
- Aditya Kusupati, Vivek Ramanujan, Ali Farhadi
- Computer ScienceInternational Conference on Machine Learning
- 8 February 2020
STR is a simple mechanism which learns effective sparsity budgets that contrast with popular heuristics that boosts the accuracy over existing results by up to 10% in the ultra sparse (99%) regime and can also be used to induce low-rank (structured sparsity) in RNNs.
Supermasks in Superposition
- Mitchell Wortsman, Vivek Ramanujan, Ali Farhadi
- Computer ScienceNeural Information Processing Systems
- 26 June 2020
The Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting, is presented and it is found that a single gradient step is often sufficient to identify the correct mask, even among 2500 tasks.
LAION-5B: An open large-scale dataset for training next generation image-text models
- Christoph Schuhmann, Romain Beaumont, J. Jitsev
- Computer ScienceArXiv
- 16 October 2022
This work presents LAION-5B a dataset consisting of 5.85 billion CLIP-filtered image-text pairs, of which 2.32B contain English language, and shows successful replication and fine-tuning of foundational models like CLIP, GLIDE and Stable Diffusion using the dataset, and discusses further experiments enabled with an openly available dataset of this scale.
Discovering Neural Wirings
- Mitchell Wortsman, Ali Farhadi, Mohammad Rastegari
- Computer ScienceNeural Information Processing Systems
- 3 June 2019
DNW provides an effective mechanism for discovering sparse subnetworks of predefined architectures in a single training run and is regarded as unifying core aspects of the neural architecture search problem with sparse neural network learning.
Learning Neural Network Subspaces
- Mitchell Wortsman, Maxwell Horton, Carlos Guestrin, Ali Farhadi, Mohammad Rastegari
- Computer ScienceInternational Conference on Machine Learning
- 20 February 2021
This work uses the subspace midpoint to boost accuracy, calibration, and robustness to label noise, outperforming Stochastic Weight Averaging and approaching the ensemble performance of independently trained networks without the training cost.
Patching open-vocabulary models by interpolating weights
- Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt
- Computer ScienceArXiv
- 10 August 2022
PAINT, a patching method that uses interpolations between the weights of a model before and after patching and the weights after on a task to be patched, is introduced, demonstrating that it is possible to expand the set of tasks on which open-vocabulary models achieve high accuracy without re-training them from scratch.
...
...