Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework
@article{Tao2021ExploringTE, title={Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework}, author={Chenxin Tao and Honghui Wang and Xizhou Zhu and Jiahua Dong and Shiji Song and Gao Huang and Jifeng Dai}, journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2021}, pages={14411-14420} }
Self-supervised learning has shown its great potential to extract powerful visual representations without human annotations. Various works are proposed to deal with self-supervised learning from different perspectives: (1) contrastive learning methods (e.g., MoCo, SimCLR) utilize both positive and negative samples to guide the training direction; (2) asymmetric network methods (e.g., BYOL, SimSiam) get rid of negative samples via the introduction of a predictor network and the stop-gradient…
Figures and Tables from this paper
19 Citations
Siamese Image Modeling for Self-Supervised Vision Representation Learning
- Computer ScienceArXiv
- 2022
Siamese Image Modeling is proposed, which predicts the dense representations of an augmented view, based on another masked view from the same image but with different augmentations, and can surpass both ID and MIM on a wide range of downstream tasks.
Improving Masked Autoencoders by Learning Where to Mask
- Computer ScienceArXiv
- 2023
AutoMAE is presented, a fully differentiable framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process that can adaptively find patches with higher information density for different images, and strike a balance between the information gain obtained from image reconstruction and its practical training difficulty.
ContraNorm: A Contrastive Learning Perspective on Oversmoothing and Beyond
- Computer ScienceArXiv
- 2023
A novel normalization layer called ContraNorm is proposed, inspired by the effectiveness of contrastive learning in preventing dimensional collapse, which implicitly shatters representations in the embedding space, leading to a more uniform distribution and a slighter dimensional collapse.
Similarity Contrastive Estimation for Self-Supervised Soft Contrastive Learning
- Computer Science2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- 2023
This work proposes a novel formulation of contrastive learning using semantic similarity between instances called Similarity Contrastive Estimation (SCE), which estimates from one view of a batch a continuous distribution to push or pull instances based on their semantic similarities.
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
- Computer ScienceArXiv
- 2022
This work proposes a novel formulation of contrastive learning using semantic similarity between instances called Similarity Contrastive Estimation (SCE), and shows that SCE reaches state-of-the-art results for pretraining video representation and that the learned representation can generalize to video downstream tasks.
Ladder Siamese Network: a Method and Insights for Multi-level Self-Supervised Learning
- Computer ScienceArXiv
- 2022
This work proposes a framework to exploit intermediate self-supervisions in each stage of deep nets, called the Ladder Siamese Network, and improves image-level classification, instance-level detection, and pixel-level segmentation simultaneously.
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
- Computer ScienceArXiv
- 2022
An all-in-one single-stage pre-training approach, named M3I Pre-training, which achieves better performance than previous pretraining methods on various vision benchmarks, including ImageNet classification, COCO.
FedSiam-DA: Dual-aggregated Federated Learning via Siamese Networks under Non-IID Data
- Computer Science
- 2022
FedSiam-DA, a novel dual-aggregated contrastive federated learning approach, to personalize both local and global models, under various settings of data heterogeneity, achieves outperforming several previous FL approaches on heterogeneous datasets.
Unifying Visual Contrastive Learning for Object Recognition from a Graph Perspective
- Computer ScienceECCV
- 2022
This paper proposes to Unify existing unsupervised Visual Contrastive Learning methods by using a GCN layer as the predictor layer (UniVCL), which deserves two merits to un supervised learning in object recognition.
RegionCL: Exploring Contrastive Region Pairs for Self-supervised Representation Learning
- Computer ScienceECCV
- 2022
. Self-supervised learning methods (SSL) have achieved significant success via maximizing the mutual information between two augmented views, where cropping is a popular augmentation technique.…
References
SHOWING 1-10 OF 35 REFERENCES
Unsupervised Finetuning
- Computer ScienceArXiv
- 2021
This paper finds the source data is crucial when shifting the finetuning paradigm from supervise to unsupervise, and proposes two simple and effective strategies to combine source and target data into unsupervised finetuned: “sparse source data replaying”, and “data mixing”.
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
- Computer ScienceICLR
- 2022
This paper introduces VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually.
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
- Computer ScienceICML
- 2021
This work proposes an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible.
Exploring Simple Siamese Representation Learning
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
Surprising empirical results are reported that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders.
Improved Baselines with Momentum Contrastive Learning
- Computer ScienceArXiv
- 2020
With simple modifications to MoCo, this note establishes stronger baselines that outperform SimCLR and do not require large training batches, and hopes this will make state-of-the-art unsupervised learning research more accessible.
ImageNet: A large-scale hierarchical image database
- Computer Science2009 IEEE Conference on Computer Vision and Pattern Recognition
- 2009
A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Emerging Properties in Self-Supervised Vision Transformers
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
This paper questions if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets) and implements DINO, a form of self-distillation with no labels, which implements the synergy between DINO and ViTs.
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
- Computer ScienceNeurIPS
- 2020
This paper proposes an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons, and uses a swapped prediction mechanism where it predicts the cluster assignment of a view from the representation of another view.
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
- Computer ScienceNeurIPS
- 2020
This work introduces Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning that performs on par or better than the current state of the art on both transfer and semi- supervised benchmarks.
Momentum Contrast for Unsupervised Visual Representation Learning
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a…