# Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss

@inproceedings{HaoChen2021ProvableGF, title={Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss}, author={Jeff Z. HaoChen and Colin Wei and Adrien Gaidon and Tengyu Ma}, booktitle={Neural Information Processing Systems}, year={2021} }

Recent works in self-supervised learning have advanced the state-of-the-art by relying on the contrastive learning paradigm, which learns representations by pushing positive pairs, or similar examples from the same class, closer together while keeping negative pairs far apart. Despite the empirical successes, theoretical foundations are limited – prior analyses assume conditional independence of the positive pairs given the same class label, but recent empirical applications use heavily…

## 78 Citations

### What shapes the loss landscape of self-supervised learning?

- Computer ScienceArXiv
- 2022

An analytically tractable theory of SSL landscape is derived and it is shown that it accurately captures an array of collapse phenomena and identiﬁes their causes.

### Self-Supervised Learning with an Information Maximization Criterion

- Computer ScienceArXiv
- 2022

This article proposes a self-supervised learning method that uses a second-order statistics-based mutual information measure that reﬂects the level of correlation among its arguments and prevents dimensional collapse by encouraging the spread of information across the whole feature space.

### Orchestra: Unsupervised Federated Learning via Globally Consistent Clustering

- Computer ScienceICML
- 2022

The algorithmic pipeline in Orchestra guarantees good generalization performance under a linear probe, allowing it to outperform alternative techniques in a broad range of conditions, including variation in heterogeneity, number of clients, participation ratio, and local epochs.

### Understanding Contrastive Learning Requires Incorporating Inductive Biases

- Computer ScienceICML
- 2022

It is demonstrated that analyses, that ignore inductive biases of the function class and training algorithm, cannot adequately explain the success of contrastive learning, even provably leading to vacuous guarantees in some settings.

### Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation

- Computer ScienceICML
- 2022

It is theoretically shown that contrastive pre-training can learn features that vary subtantially across domains but still generalize to the target domain, by disentangling domain and class information, and empirically validated on benchmark vision datasets.

### Can Single-Pass Contrastive Learning Work for Both Homophilic and Heterophilic Graph?

- Computer Science
- 2022

The concentration property of features obtained by neighborhood aggregation on both homophilic and heterophilic graphs is analyzed, and the single-pass graph contrastive learning loss based on the property is introduced, and performance guarantees of the minimizer of the loss on downstream tasks are provided.

### Neural Eigenfunctions Are Structured Representation Learners

- Computer ScienceArXiv
- 2022

This paper shows that, when the kernel is derived from positive relations in a contrastive learning setup, the method outperforms a number of competitive baselines in visual representation learning and transfer learning benchmarks, and importantly, produces structured representations where the order of features indicates degrees of importance.

### How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders

- Computer Science
- 2022

A theoretical understanding of how masking matters for MAE to learn meaningful features is proposed and a close connection between MAE and contrastive learning is established, which shows that MAE implicit aligns the mask-induced positive pairs.

### Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions

- Computer ScienceArXiv
- 2022

It is proved that a simple representation obtained by combining this kernel with PCA provably minimizes the worst-case approximation error of linear predictors, under a straightforward assumption that positive pairs have similar labels.

### Joint Embedding Self-Supervised Learning in the Kernel Regime

- Computer Science
- 2022

This work derives methods to derive the optimal form of the output representations for contrastive and non-contrastive loss functions of algorithms based on kernel methods where embeddings are constructed by linear maps acting on the feature space of a kernel.

## References

SHOWING 1-10 OF 78 REFERENCES

### A Simple Framework for Contrastive Learning of Visual Representations

- Computer ScienceICML
- 2020

It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.

### Exploring Simple Siamese Representation Learning

- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021

Surprising empirical results are reported that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders.

### Beyond Separability: Analyzing the Linear Transferability of Contrastive Representations to Related Subpopulations

- Computer ScienceArXiv
- 2022

It is proved that linear transferability can occur when data from the same class in different domains are more related with each other than data from different classes inDifferent domains (e.g., photo dogs and cartoon dogs) are.

### Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation

- Computer ScienceICML
- 2022

It is theoretically shown that contrastive pre-training can learn features that vary subtantially across domains but still generalize to the target domain, by disentangling domain and class information, and empirically validated on benchmark vision datasets.

### VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

- Computer ScienceICLR
- 2022

This paper introduces VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually.

### MiCE: Mixture of Contrastive Experts for Unsupervised Image Clustering

- Computer ScienceICLR
- 2021

A scalable variant of the Expectation-Maximization algorithm for MiCE is developed and proof of the convergence is provided to solve the nontrivial inference and learning problems caused by the latent variables.

### Barlow Twins: Self-Supervised Learning via Redundancy Reduction

- Computer ScienceICML
- 2021

This work proposes an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible.

### A Theory of Label Propagation for Subpopulation Shift

- Computer ScienceICML
- 2021

This work proposes a provably effective framework for domain adaptation based on label propagation based on a simple but realistic expansion assumption, and adapt consistency-based semi-supervised learning methods to domain adaptation settings and gain significant improvements.

### Contrastive Learning Inverts the Data Generating Process

- BiologyICML
- 2021

It is proved that feed-forward models trained with objectives belonging to the commonly used InfoNCE family learn to implicitly invert the underlying generative model of the observed data.

### An Almost Constant Lower Bound of the Isoperimetric Coefficient in the KLS Conjecture

- Mathematics
- 2020

We prove an almost constant lower bound of the isoperimetric coefficient in the KLS conjecture. The lower bound has the dimension dependency $$d^{-o_d(1)}$$ d - o d ( 1 ) . When the dimension is…