Exploring Simple Siamese Representation Learning

@article{Chen2021ExploringSS,
  title={Exploring Simple Siamese Representation Learning},
  author={Xinlei Chen and Kaiming He},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2021},
  pages={15745-15753}
}
  • Xinlei Chen, Kaiming He
  • Published 20 November 2020
  • Computer Science
  • 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Siamese networks have become a common structure in various recent models for unsupervised visual representation learning. These models maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions. In this paper, we report surprising empirical results that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. Our… 

Figures and Tables from this paper

Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework
TLDR
UniGrad is proposed, a simple but effective gradient form for self-supervised learning that does not require a memory bank or a predictor network, but can still achieve state-of-the-art performance and easily adopt other training strategies.
Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning
TLDR
A novel selfsupervised approach to learn node representations by enhancing Siamese self-distillation with multi-scale contrastive learning that achieves new state-of-theart results but also surpasses some semi-supervised counterparts by large margins.
Jointly Learnable Data Augmentations for Self-Supervised GNNs
TLDR
This study proposes, GRAPHSURGEON, a novel SSL method for GNNs with the following features: a learnable data augmentation method that is jointly learned with the embeddings by leveraging the inherent signal encoded in the graph.
Compressive Visual Representations
TLDR
This work hypothesizes that adding explicit information compression to SimCLR and BYOL yields better and more robust representations, and confirms that adding compression to these algorithms significantly improves linear evaluation accuracies and model robustness across a wide range of domain shifts.
Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly
A current remarkable improvement of unsupervised visual representation learning is based on heavy networks with large-batch training. While recent methods have greatly reduced the gap between
From Canonical Correlation Analysis to Self-supervised Graph Neural Networks
TLDR
A conceptually simple yet effective model for self-supervised representation learning with graph data that aims at discarding augmentation-variant information by learning invariant representations and can prevent degenerated solutions by decorrelating features in different dimensions is introduced.
Cluster Analysis with Deep Embeddings and Contrastive Learning
TLDR
This work proposes a novel framework for performing image clustering from deep embeddings by combining instance-level contrastive learning with a deep embedding based cluster center predictor that performs on par with widely accepted clustering methods and outperforms the state-of-the-art Contrastive learning method on the CIFAR-10 dataset.
Negative sampling strategies for contrastive self-supervised learning of graph representations
TLDR
This work proposes a general framework for learning node representations in a self supervised manner called Graph Constrastive Learning (GraphCL), which learns node embeddings by maximizing the similarity between the nodes representations of two randomly perturbed versions of the same graph.
Un-Mix: Rethinking Image Mixtures for Unsupervised Visual Representation Learning
TLDR
This work aims to involve the distance concept on label space in the unsupervised learning and let the model be aware of the soft degree of similarity between positive or negative pairs through mixing the input data space, to further work collaboratively for the input and loss spaces.
High Fidelity Visualization of What Your Self-Supervised Representation Knows About
TLDR
This work showcases the use of a conditional diffusion based generative model (RCDM) to visualize representations learned with self-supervised models and shows visually that SSL (backbone) representation are not really invariant to many data augmentation they were trained on.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
Siamese Neural Networks for One-Shot Image Recognition
TLDR
A method for learning siamese neural networks which employ a unique structure to naturally rank similarity between inputs and is able to achieve strong results which exceed those of other deep learning models with near state-of-the-art performance on one-shot classification tasks.
Representation Learning with Contrastive Predictive Coding
TLDR
This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.
Self-labelling via simultaneous clustering and representation learning
TLDR
The proposed novel and principled learning formulation is able to self-label visual data so as to train highly competitive image representations without manual labels and yields the first self-supervised AlexNet that outperforms the supervised Pascal VOC detection baseline.
Learning deep representations by mutual information estimation and maximization
TLDR
It is shown that structure matters: incorporating knowledge about locality in the input into the objective can significantly improve a representation’s suitability for downstream tasks and is an important step towards flexible formulations of representation learning objectives for specific end-goals.
Learning Multiple Layers of Features from Tiny Images
TLDR
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.
Momentum Contrast for Unsupervised Visual Representation Learning
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
TLDR
A reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction is presented, improving the conditioning of the optimization problem and speeding up convergence of stochastic gradient descent.
Unsupervised Feature Learning via Non-parametric Instance Discrimination
TLDR
This work forms this intuition as a non-parametric classification problem at the instance-level, and uses noise-contrastive estimation to tackle the computational challenges imposed by the large number of instance classes.
Learning Representations by Maximizing Mutual Information Across Views
TLDR
This work develops a model which learns image representations that significantly outperform prior methods on the tasks the authors consider, and extends this model to use mixture-based representations, where segmentation behaviour emerges as a natural side-effect.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
...
1
2
3
4
...