• Corpus ID: 245006078

Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework

  title={Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework},
  author={Chenxin Tao and Honghui Wang and Xizhou Zhu and Jiahua Dong and Shiji Song and Gao Huang and Jifeng Dai},
Self-supervised learning has shown its great potential to extract powerful visual representations without human annotations. Various works are proposed to deal with selfsupervised learning from different perspectives: (1) contrastive learning methods (e.g., MoCo, SimCLR) utilize both positive and negative samples to guide the training direction; (2) asymmetric network methods (e.g., BYOL, SimSiam) get rid of negative samples via the introduction of a predictor network and the stop-gradient… 
1 Citations

Figures and Tables from this paper

Relational Self-Supervised Learning
This paper introduces a novel SSL paradigm, which is term as relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances by employing sharpened distribution of pairwise similarities among different instances as relation metric.


Exploring Simple Siamese Representation Learning
  • Xinlei Chen, Kaiming He
  • Computer Science
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2021
Surprising empirical results are reported that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders.
Self-Supervised Learning by Estimating Twin Class Distributions
TWIST performs surprisingly well on semi-supervised learning, achieving 61.2% top-1 accuracy with 1% ImageNet labels using a ResNet-50 as backbone, surpassing previous best results by an absolute improvement of 6.2%.
BYOL works even without batch statistics
Replacing BN with a batch-independent normalization scheme (namely, a combination of group normalization and weight standardization) achieves performance comparable to vanilla BYOL, and disproves the hypothesis that the use of batch statistics is a crucial ingredient for BYOL to learn useful representations.
Emerging Properties in Self-Supervised Vision Transformers
This paper questions if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets) and implements DINO, a form of self-distillation with no labels, which implements the synergy between DINO and ViTs.
Improving Contrastive Learning by Visualizing Feature Transformation
This paper attempts to devise a feature-level data manipulation, differing from data augmentation, to enhance the generic contrastive self-supervised learning and proposes the interpolation among negatives, which provides diversified negatives and makes the model more discriminative.
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
This paper introduces VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually.
Representation Learning with Contrastive Predictive Coding
This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.
CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features
Patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches, and CutMix consistently outperforms state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on ImageNet weakly-supervised localization task.
Improved Baselines with Momentum Contrastive Learning
With simple modifications to MoCo, this note establishes stronger baselines that outperform SimCLR and do not require large training batches, and hopes this will make state-of-the-art unsupervised learning research more accessible.
Unsupervised Finetuning
This paper finds the source data is crucial when shifting the finetuning paradigm from supervise to unsupervise, and proposes two simple and effective strategies to combine source and target data into unsupervised finetuned: “sparse source data replaying”, and “data mixing”.