SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning

  title={SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning},
  author={Colorado Reed and Sean L. Metzger and A. Srinivas and Trevor Darrell and Kurt Keutzer},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
A common practice in unsupervised representation learning is to use labeled data to evaluate the quality of the learned representations. This supervised evaluation is then used to guide critical aspects of the training process such as selecting the data augmentation policy. However, guiding an unsupervised training process through supervised evaluations is not possible for real-world data that does not actually contain labels (which may be the case, for example, in privacy sensitive fields such… 

Figures and Tables from this paper

Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning
Through a large-scale empirical study with a diverse family of SSL algorithms, it is found that CLID better correlates with in-distribution model performance than other competing recent evaluation schemes.
On Pre-Training for Federated Learning
In most of the literature on federated learning (FL), neural networks are initialized with random weights. In this paper, we present an empirical study on the effect of pre-training on FL.
A Survey of Automated Data Augmentation Algorithms for Deep Learning-based Image Classication Tasks
This survey discusses the underlying reasons of the emergence of AutoDA technology from the perspective of image classification, and identifies three key components of a standard AutoDA model: a search space, a search algorithm and an optimal DA policies.
Multi-Augmentation for Efficient Visual Representation Learning for Self-supervised Pre-training
MA-SSRL successfully learns the invariant feature representation and presents an efficient, effective, and adaptable data augmentation pipeline for self-supervised pre-training on different distribution and domain datasets.
Chaos is a Ladder: A New Theoretical Understanding of Contrastive Learning via Augmentation Overlap
The theory suggests an alternative understanding of contrastive learning: the role of aligning positive samples is more like a surrogate task than an ultimate goal, and the overlapped augmented views create a ladder for Contrastive learning to gradually learn class-separated representations.
TeachAugment: Data Augmentation Optimization Using Teacher Knowledge
A data augmentation optimization method based on the adversarial strategy called TeachAugment is proposed, which can produce informative transformed images to the model without requiring careful tuning by leveraging a teacher model and outperforms existing methods in experiments of image classification, semantic segmentation, and unsupervised representation learning tasks.
Self-Supervised Pretraining Improves Self-Supervised Pretraining
H Hierarchical PreTraining (HPT) is explored, which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model, and provides a simple framework for obtaining better pretrained representations with less computational resources.
Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning
This work proposes to generalize MSF algorithm by constraining the search space for nearest neighbors, and shows that this method outperforms MSF in SSL setting when the constraint utilizes a different augmentation of an image, and outperforms PAWS in semi-supervised setting with less training resources when the constraints ensures the NNs have the same pseudolabel as the query.
Contrastive Representation Learning with Trainable Augmentation Channel
This work formalizes a stochastic encoding process in which there exist a tug-of-war between the data corruption introduced by the augmentations and the information preserved by the encoder, and shows that it can learn a data-dependent distribution of augmentations to avoid the collapse of the representation.
Learning neural decoders without labels using multiple data streams
This work learns neural decoders without labels by leveraging multiple simultaneously recorded data streams, including neural, kinematic, and physiological signals by applying cross-modal, self-supervised deep clustering to decode movements from brain recordings.


The PASCAL Visual Object Classes Challenge
Learning Deep Features for Scene Recognition using Places Database
A new scene-centric database called Places with over 7 million labeled pictures of scenes is introduced with new methods to compare the density and diversity of image datasets and it is shown that Places is as dense as other scene datasets and has more diversity.
RandAugment: Practical data augmentation with no separate search
RandAugment can be used uniformly across different tasks and datasets and works out of the box, matching or surpassing all previous learned augmentation approaches on CIFAR-10, CIFar-100, SVHN, and ImageNet.
AutoAugment: Learning Augmentation Strategies From Data
This paper describes a simple procedure called AutoAugment to automatically search for improved data augmentation policies, which achieves state-of-the-art accuracy on CIFAR-10, CIFar-100, SVHN, and ImageNet (without additional data).
Fast AutoAugment
This paper proposes an algorithm called Fast AutoAugment that finds effective augmentation policies via a more efficient search strategy based on density matching that speeds up the search time by orders of magnitude while achieves comparable performances on image recognition tasks with various models and datasets.
Revisiting Self-Supervised Visual Representation Learning
This study revisits numerous previously proposed self-supervised models, conducts a thorough large scale study and uncovers multiple crucial insights about standard recipes for CNN design that do not always translate to self- supervised representation learning.
Improved Baselines with Momentum Contrastive Learning
With simple modifications to MoCo, this note establishes stronger baselines that outperform SimCLR and do not require large training batches, and hopes this will make state-of-the-art unsupervised learning research more accessible.
A Simple Framework for Contrastive Learning of Visual Representations
It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
Momentum Contrast for Unsupervised Visual Representation Learning
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a
Scaling and Benchmarking Self-Supervised Visual Representation Learning
It is shown that by scaling on various axes (including data size and problem 'hardness'), one can largely match or even exceed the performance of supervised pre-training on a variety of tasks such as object detection, surface normal estimation and visual navigation using reinforcement learning.