# Towards Demystifying Representation Learning with Non-contrastive Self-supervision

@article{Wang2021TowardsDR, title={Towards Demystifying Representation Learning with Non-contrastive Self-supervision}, author={Xiang Wang and Xinlei Chen and Simon Shaolei Du and Yuandong Tian}, journal={ArXiv}, year={2021}, volume={abs/2110.04947} }

Non-contrastive methods of self-supervised learning (such as BYOL and SimSiam) learn representations by minimizing the distance between two views of the same image. These approaches have achieved remarkable performance in practice, but it is not well understood 1) why these methods do not collapse to the trivial solutions and 2) how the representation is learned. Tian et al. (2021) made an initial attempt on the first question and proposed DirectPred that sets the predictor directly. In our…

## One Citation

The Power of Contrast for Feature Learning: A Theoretical Analysis

- Computer Science, MathematicsArXiv
- 2021

It is provably shown that contrastive learning outperforms autoencoder, a classical unsupervised learning method, for both feature recovery and downstream tasks, and the role of labeled data in supervised contrastivelearning is illustrated.

## References

SHOWING 1-10 OF 15 REFERENCES

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

- Computer ScienceArXiv
- 2021

This paper introduces VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually.

Learning Representations by Maximizing Mutual Information Across Views

- Computer Science, MathematicsNeurIPS
- 2019

This work develops a model which learns image representations that significantly outperform prior methods on the tasks the authors consider, and extends this model to use mixture-based representations, where segmentation behaviour emerges as a natural side-effect.

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

- Computer Science, MathematicsICML
- 2018

This paper suggests that, sometimes, increasing depth can speed up optimization and proves that it is mathematically impossible to obtain the acceleration effect of overparametrization via gradients of any regularizer.

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

- Computer Science, MathematicsICLR
- 2019

The speed of convergence to global optimum for gradient descent training a deep linear neural network is analyzed by minimizing the $\ell_2$ loss over whitened data by maximizing the initial loss of any rank-deficient solution.

A mathematical theory of semantic development in deep neural networks

- Computer Science, MedicineProceedings of the National Academy of Sciences
- 2019

Notably, this simple neural model qualitatively recapitulates many diverse regularities underlying semantic development, while providing analytic insight into how the statistical structure of an environment can interact with nonlinear deep-learning dynamics to give rise to these regularities.

Large Batch Training of Convolutional Networks

- Computer Science
- 2017

It is argued that the current recipe for large batch training (linear learning rate scaling with warm-up) is not general enough and training may diverge and a new training algorithm based on Layer-wise Adaptive Rate Scaling (LARS) is proposed.

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

- Computer Science, PhysicsICLR
- 2014

It is shown that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions.

High-Dimensional Statistics

- Computer Science
- 2019

This book provides a self-contained introduction to the area of high-dimensional statistics, aimed at the first-year graduate level, and includes chapters that are focused on core methodology and theory - including tail bounds, concentration inequalities, uniform laws and empirical process, and random matrices.

High-Dimensional Probability

- Computer Science
- 2018

A broad range of illustrations is embedded throughout, including classical and modern results for covariance estimation, clustering, networks, semidefinite programming, coding, dimension reduction, matrix completion, machine learning, compressed sensing, and sparse regression.

Introduction to the non-asymptotic analysis of random matrices

- Mathematics, Computer ScienceCompressed Sensing
- 2012

This is a tutorial on some basic non-asymptotic methods and concepts in random matrix theory, particularly for the problem of estimating covariance matrices in statistics and for validating probabilistic constructions of measurementMatrices in compressed sensing.