• Corpus ID: 238583030

Towards Demystifying Representation Learning with Non-contrastive Self-supervision

@article{Wang2021TowardsDR,
  title={Towards Demystifying Representation Learning with Non-contrastive Self-supervision},
  author={Xiang Wang and Xinlei Chen and Simon Shaolei Du and Yuandong Tian},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.04947}
}
  • Xiang Wang, Xinlei Chen, +1 author Yuandong Tian
  • Published 11 October 2021
  • Computer Science, Mathematics
  • ArXiv
Non-contrastive methods of self-supervised learning (such as BYOL and SimSiam) learn representations by minimizing the distance between two views of the same image. These approaches have achieved remarkable performance in practice, but it is not well understood 1) why these methods do not collapse to the trivial solutions and 2) how the representation is learned. Tian et al. (2021) made an initial attempt on the first question and proposed DirectPred that sets the predictor directly. In our… 

Figures and Tables from this paper

The Power of Contrast for Feature Learning: A Theoretical Analysis
TLDR
It is provably shown that contrastive learning outperforms autoencoder, a classical unsupervised learning method, for both feature recovery and downstream tasks, and the role of labeled data in supervised contrastivelearning is illustrated.

References

SHOWING 1-10 OF 15 REFERENCES
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
TLDR
This paper introduces VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually.
Learning Representations by Maximizing Mutual Information Across Views
TLDR
This work develops a model which learns image representations that significantly outperform prior methods on the tasks the authors consider, and extends this model to use mixture-based representations, where segmentation behaviour emerges as a natural side-effect.
On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
TLDR
This paper suggests that, sometimes, increasing depth can speed up optimization and proves that it is mathematically impossible to obtain the acceleration effect of overparametrization via gradients of any regularizer.
A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks
TLDR
The speed of convergence to global optimum for gradient descent training a deep linear neural network is analyzed by minimizing the $\ell_2$ loss over whitened data by maximizing the initial loss of any rank-deficient solution.
A mathematical theory of semantic development in deep neural networks
TLDR
Notably, this simple neural model qualitatively recapitulates many diverse regularities underlying semantic development, while providing analytic insight into how the statistical structure of an environment can interact with nonlinear deep-learning dynamics to give rise to these regularities.
Large Batch Training of Convolutional Networks
TLDR
It is argued that the current recipe for large batch training (linear learning rate scaling with warm-up) is not general enough and training may diverge and a new training algorithm based on Layer-wise Adaptive Rate Scaling (LARS) is proposed.
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
TLDR
It is shown that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions.
High-Dimensional Statistics
TLDR
This book provides a self-contained introduction to the area of high-dimensional statistics, aimed at the first-year graduate level, and includes chapters that are focused on core methodology and theory - including tail bounds, concentration inequalities, uniform laws and empirical process, and random matrices.
High-Dimensional Probability
TLDR
A broad range of illustrations is embedded throughout, including classical and modern results for covariance estimation, clustering, networks, semidefinite programming, coding, dimension reduction, matrix completion, machine learning, compressed sensing, and sparse regression.
Introduction to the non-asymptotic analysis of random matrices
  • R. Vershynin
  • Mathematics, Computer Science
    Compressed Sensing
  • 2012
TLDR
This is a tutorial on some basic non-asymptotic methods and concepts in random matrix theory, particularly for the problem of estimating covariance matrices in statistics and for validating probabilistic constructions of measurementMatrices in compressed sensing.
...
1
2
...