SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning

@article{Reed2021SelfAugmentAA,
  title={SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning},
  author={Colorado Reed and Sean Metzger and A. Srinivas and Trevor Darrell and Kurt Keutzer},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={2673-2682}
}
A common practice in unsupervised representation learning is to use labeled data to evaluate the quality of the learned representations. This supervised evaluation is then used to guide critical aspects of the training process such as selecting the data augmentation policy. However, guiding an unsupervised training process through supervised evaluations is not possible for real-world data that does not actually contain labels (which may be the case, for example, in privacy sensitive fields such… 

Figures and Tables from this paper

Chaos is a Ladder: A New Theoretical Understanding of Contrastive Learning via Augmentation Overlap
TLDR
The theory suggests an alternative understanding of contrastive learning: the role of aligning positive samples is more like a surrogate task than an ultimate goal, and the overlapped augmented views create a ladder for Contrastive learning to gradually learn class-separated representations.
Self-Supervised Pretraining Improves Self-Supervised Pretraining
TLDR
H Hierarchical PreTraining (HPT) is explored, which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model, and provides a simple framework for obtaining better pretrained representations with less computational resources.
TeachAugment: Data Augmentation Optimization Using Teacher Knowledge
TLDR
A data augmentation optimization method based on the adversarial strategy called TeachAugment is proposed, which can produce informative transformed images to the model without requiring careful tuning by leveraging a teacher model and outperforms existing methods in experiments of image classification, semantic segmentation, and unsupervised representation learning tasks.
Constrained Mean Shift Using Distant Yet Related Neighbors for Representation Learning
TLDR
This work proposes to generalize MSF algorithm by constraining the search space for nearest neighbors, and shows that this method outperforms MSF in SSL setting when the constraint utilizes a different augmentation of an image, and outperforms PAWS in semi-supervised setting with less training resources when the constraints ensures the NNs have the same pseudolabel as the query.
Contrastive Representation Learning with Trainable Augmentation Channel
TLDR
This work formalizes a stochastic encoding process in which there exist a tug-of-war between the data corruption introduced by the augmentations and the information preserved by the encoder, and shows that it can learn a data-dependent distribution of augmentations to avoid the collapse of the representation.
Learning neural decoders without labels using multiple data streams
TLDR
This work learns neural decoders without labels by leveraging multiple simultaneously recorded data streams, including neural, kinematic, and physiological signals by applying cross-modal, self-supervised deep clustering to decode movements from brain recordings.
No True State-of-the-Art? OOD Detection Methods are Inconsistent across Datasets
TLDR
A distance-based method is proposed, Pairwise OOD detection (POD), which is based on Siamese networks and improves over Mahalanobis by sidestepping the expensive covariance estimation step.
On Feature Decorrelation in Self-Supervised Learning
TLDR
It is verified the existence of complete collapse and another reachable collapse pattern that is usually overlooked, namely dimensional collapse, and is connected with strong correlations between axes and considered as a strong motivation for feature decorrelation.
Predicting with Confidence on Unseen Distributions
TLDR
This investigation determines that common distributional distances, such as Frechet distance or Maximum Mean Discrepancy, fail to induce reliable estimates of performance under distribution shift, and finds that the proposed difference of confidences (DoC) approach yields successful estimates of a classifier’s performance over a variety of shifts and model architectures.
Region Similarity Representation Learning
TLDR
Through object detection, instance segmentation, and dense pose estimation experiments, this work illustrates how ReSim learns representations which significantly improve the localization and classification performance compared to a competitive MoCo-v2 baseline.
...
1
2
...

References

SHOWING 1-10 OF 49 REFERENCES
The PASCAL Visual Object Classes Challenge
Learning Deep Features for Scene Recognition using Places Database
TLDR
A new scene-centric database called Places with over 7 million labeled pictures of scenes is introduced with new methods to compare the density and diversity of image datasets and it is shown that Places is as dense as other scene datasets and has more diversity.
AutoAugment: Learning Augmentation Strategies From Data
TLDR
This paper describes a simple procedure called AutoAugment to automatically search for improved data augmentation policies, which achieves state-of-the-art accuracy on CIFAR-10, CIFar-100, SVHN, and ImageNet (without additional data).
Fast AutoAugment
TLDR
This paper proposes an algorithm called Fast AutoAugment that finds effective augmentation policies via a more efficient search strategy based on density matching that speeds up the search time by orders of magnitude while achieves comparable performances on image recognition tasks with various models and datasets.
RandAugment: Practical data augmentation with no separate search
TLDR
RandAugment can be used uniformly across different tasks and datasets and works out of the box, matching or surpassing all previous learned augmentation approaches on CIFAR-10, CIFar-100, SVHN, and ImageNet.
Revisiting Self-Supervised Visual Representation Learning
TLDR
This study revisits numerous previously proposed self-supervised models, conducts a thorough large scale study and uncovers multiple crucial insights about standard recipes for CNN design that do not always translate to self- supervised representation learning.
A Simple Framework for Contrastive Learning of Visual Representations
TLDR
It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
Improved Baselines with Momentum Contrastive Learning
TLDR
With simple modifications to MoCo, this note establishes stronger baselines that outperform SimCLR and do not require large training batches, and hopes this will make state-of-the-art unsupervised learning research more accessible.
Momentum Contrast for Unsupervised Visual Representation Learning
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a
Scaling and Benchmarking Self-Supervised Visual Representation Learning
TLDR
It is shown that by scaling on various axes (including data size and problem 'hardness'), one can largely match or even exceed the performance of supervised pre-training on a variety of tasks such as object detection, surface normal estimation and visual navigation using reinforcement learning.
...
1
2
3
4
5
...