Big Transfer (BiT): General Visual Representation Learning

@inproceedings{Kolesnikov2020BigT,
  title={Big Transfer (BiT): General Visual Representation Learning},
  author={Alexander Kolesnikov and Lucas Beyer and Xiaohua Zhai and Joan Puigcerver and Jessica Yung and Sylvain Gelly and Neil Houlsby},
  booktitle={ECCV},
  year={2020}
}
Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). By combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets. BiT… Expand
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations
TLDR
This work describes how to generate a dataset with over a billion images via large weakly-supervised pretraining to improve the performance of these visual representations, and leverage Transformers to replace the traditional convolutional backbone, with insights into both system and performance improvements, especially at 1B+ image scale. Expand
Rethinking binary hyperparameters for deep transfer learning for image classification
The current standard for a variety of computer vision tasks using smaller numbers of labelled training examples is to fine-tune from weights pre-trained on a large image classification dataset suchExpand
Scalable Transfer Learning with Expert Models
TLDR
This work trains a diverse set of experts by exploiting existing label structures, and uses cheap-to-compute performance proxies to select the relevant expert for each target task, and provides an adapter-based architecture able to compress many experts into a single model. Expand
Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark
TLDR
A cross-family study of the best transfer and meta learners on both a large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning benchmark (Visual Task Adaptation Benchmark, VTAB) finds that, on average, large- scale transfer methods (Big Transfer, BiT) outperform competing approaches on MD, even when trained only on ImageNet. Expand
Deep Ensembles for Low-Data Transfer Learning
TLDR
This work shows that the nature of pre-training itself is a performant source of diversity, and proposes a practical algorithm that efficiently identifies a subset ofPre-trained models for any downstream dataset and achieves state-of-the-art performance at a much lower inference budget. Expand
Factors of Influence for Transfer Learning across Diverse Appearance Domains and Task Types
TLDR
This paper carries out an extensive experimental exploration of transfer learning across vastly different image domains (consumer photos, autonomous driving, aerial imagery, underwater, indoor scenes, synthetic, close-ups) and task types (semantic segmentation, object detection, depth estimation, keypoint detection). Expand
Compensating for the Lack of Extra Training Data by Learning Extra Representation
TLDR
A novel framework, Extra Representation (ExRep), is introduced, to surmount the problem of not having access to the JFT-300M data by instead using ImageNet and the publicly available model that has been pre-trained on JFT -300M. Expand
Memory Efficient Meta-Learning with Large Images
TLDR
LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU, is proposed and used to train meta-learners and demonstrate new state-of-the-art accuracy on the real-world ORBIT benchmark and 3 of the 4 parts of the challenging VTAB+MD benchmark. Expand
A Unified Few-Shot Classification Benchmark to Compare Transfer and Meta Learning Approaches
Meta and transfer learning are two successful families of approaches to few-shot learning. Despite highly related goals, state-of-the-art advances in each family are measured largely in isolation ofExpand
ConvNets vs. Transformers: Whose Visual Representations are More Transferable?
  • Hong-Yu Zhou, Chi-Ken Lu, Sibei Yang, Yizhou Yu
  • Computer Science
  • 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
  • 2021
TLDR
This work systematically investigates the transfer learning ability of ConvNets and vision transformers in single-task and multi-task performance evaluations and finds that two ViT models heavily rely on whole network fine-tuning to achieve performance gains while Swin Transformer does not have such a requirement. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 75 REFERENCES
A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark
Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visualExpand
Exploring the Limits of Weakly Supervised Pretraining
TLDR
This paper presents a unique study of transfer learning with large convolutional networks trained to predict hashtags on billions of social media images and shows improvements on several image classification and object detection tasks, and reports the highest ImageNet-1k single-crop, top-1 accuracy to date. Expand
Explicit Inductive Bias for Transfer Learning with Convolutional Networks
TLDR
This paper investigates several regularization schemes that explicitly promote the similarity of the final solution with the initial model, and eventually recommends a simple $L^2$ penalty with the pre-trained model being a reference as the baseline of penalty for transfer learning tasks. Expand
Rethinking the Inception Architecture for Computer Vision
TLDR
This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. Expand
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
TLDR
A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet. Expand
Matching Networks for One Shot Learning
TLDR
This work employs ideas from metric learning based on deep neural features and from recent advances that augment neural networks with external memories to learn a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types. Expand
Learning to Compare: Relation Network for Few-Shot Learning
TLDR
A conceptually simple, flexible, and general framework for few-shot learning, where a classifier must learn to recognise new classes given only few examples from each, which is easily extended to zero- shot learning. Expand
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
TLDR
It is found that the performance on vision tasks increases logarithmically based on volume of training data size, and it is shown that representation learning (or pre-training) still holds a lot of promise. Expand
Revisiting Fine-tuning for Few-shot Learning
TLDR
In this study, it is shown that in the commonly used low-resolution mini-ImageNet dataset, the fine-tuning method achieves higher accuracy than common few-shot learning algorithms in the 1-shot task and nearly the same accuracy as that of the state-of-the-art algorithm in the 5- shot task. Expand
A Closer Look at Few-shot Classification
TLDR
The results reveal that reducing intra-class variation is an important factor when the feature backbone is shallow, but not as critical when using deeper backbones, and a baseline method with a standard fine-tuning practice compares favorably against other state-of-the-art few-shot learning algorithms. Expand
...
1
2
3
4
5
...