• Corpus ID: 247158785

Understanding Contrastive Learning Requires Incorporating Inductive Biases

  title={Understanding Contrastive Learning Requires Incorporating Inductive Biases},
  author={Nikunj Saunshi and Jordan T. Ash and Surbhi Goel and Dipendra Kumar Misra and Cyril Zhang and Sanjeev Arora and Sham M. Kakade and Akshay Krishnamurthy},
Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs. Recent attempts to theoreti-cally explain the success of contrastive learning on downstream classification tasks prove guarantees depending on properties of augmentations and the value of contrastive loss of representations. We demonstrate that such analyses, that ignore inductive biases of… 

Improving Self-Supervised Learning by Characterizing Idealized Representations

This work characterize properties that SSL representations should ideally satisfy and proves necessary and sufficient conditions such that for any task invariant to given data augmentations, desired probes trained on that representation attain perfect accuracy.

Do More Negative Samples Necessarily Hurt in Contrastive Learning?

It is shown in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class, that the downstream performance of the representation optimizing the population contrastive loss in fact does not degrade with the number of negative samples.

Understanding the Role of Nonlinearity in Training Dynamics of Contrastive Learning

The role of nonlinearity in the training dynamics of contrastive learning on one and two-layer nonlinear networks with homogeneous activation is studied and global modulation is discovered : those local patterns discriminative from the perspective of global-level patterns are prioritized to learn, further characterizing the learning process.

Rethinking Positive Sampling for Contrastive Learning with Kernel

This work proposes a new way to define positive samples using kernel theory along with a novel loss called decoupled uniformity, and draws a connection between contrastive learning and the conditional mean embedding theory to derive tight bounds on the downstream classification loss.

Analyzing Data-Centric Properties for Contrastive Learning on Graphs

This work rigorously contextualizes the effects of data-centric properties on augmentation strategies and learning paradigms for graph SSL, and sees that CAAs induce better invariance and separability than GGAs in this setting.

Understanding Self-Supervised Graph Representation Learning from a Data-Centric Perspective

This work rigorously contextualizes the effects of data-centric assumptions on graph SSL paradigms and augmentations, and shows how GGAs can effect the recoverability and separability assumptions and theoretically motivates the need for task-relevant data augmentations.

A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint

This study successively follows previous surveys on T2I, adding value by analogously evaluating the diverse range of existing methods, including different generative models, several types of visual output, critical examination of various approaches, and highlighting the shortcomings, suggesting the future direction of research.

Orchestra: Unsupervised Federated Learning via Globally Consistent Clustering

The algorithmic pipeline in Orchestra guarantees good generalization performance under a linear probe, allowing it to outperform alternative techniques in a broad range of conditions, including variation in heterogeneity, number of clients, participation ratio, and local epochs.

What shapes the loss landscape of self-supervised learning?

Prevention of complete and dimensional collapse of representations has recently become a design principle for self-supervised learning (SSL). However, questions remain in our theoretical



What makes for good views for contrastive learning

This paper uses empirical analysis to better understand the importance of view selection, and argues that the mutual information (MI) between views should be reduced while keeping task-relevant information intact, and devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.

Can contrastive learning avoid shortcut solutions?

The generalization of representations learned via contrastive learning depends crucially on what features of the data are extracted. However, we observe that the contrastive loss does not always

Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss

This work proposes a loss that performs spectral decomposition on the population augmentation graph and can be succinctly written as a contrastive learning objective on neural net representations, leading to features with provable accuracy guarantees under linear probe evaluation.

Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning

It is proved that contrastive learning using ReLU networks provably learns the desired sparse features if proper augmentations are adopted, and an underlying principle called feature decoupling is presented to explain the effects of augmentations.

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes and shows that learned representations can reduce (labeled) sample complexity on downstream tasks.

A Simple Framework for Contrastive Learning of Visual Representations

It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.

Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere

This work identifies two key properties related to the contrastive loss: alignment (closeness) of features from positive pairs, and uniformity of the induced distribution of the (normalized) features on the hypersphere.

Representation Learning with Contrastive Predictive Coding

This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.

Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style

Causal3DIdent, a dataset of high-dimensional, visually complex images with rich causal dependencies, which is used to study the effect of data augmentations performed in practice, and numerical simulations with dependent latent variables are consistent with theory.

Exploring Simple Siamese Representation Learning

  • Xinlei ChenKaiming He
  • Computer Science
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2021
Surprising empirical results are reported that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders.