Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions

  title={Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions},
  author={Daniel D. Johnson and Ayoub El Hanchi and Chris J. Maddison},
Contrastive learning is a powerful framework for learning self-supervised representations that generalize well to downstream supervised tasks. We show that multiple existing contrastive learning methods can be reinterpreted as learning kernel functions that approximate a fixed positive-pair kernel. We then prove that a simple representation obtained by combining this kernel with PCA provably minimizes the worst-case approximation error of linear predictors, under a straightforward assumption… 

Figures and Tables from this paper

Neural Eigenfunctions Are Structured Representation Learners

This paper shows that, when the kernel is derived from positive relations in a contrastive learning setup, the method outperforms a number of competitive baselines in visual representation learning and transfer learning benchmarks, and importantly, produces structured representations where the order of features indicates degrees of importance.



Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss

This work proposes a loss that performs spectral decomposition on the population augmentation graph and can be succinctly written as a contrastive learning objective on neural net representations, leading to features with provable accuracy guarantees under linear probe evaluation.

NeuralEF: Deconstructing Kernels by Deep Neural Networks

This work demonstrates that a new series of objective functions that generalizes the EigenGame and provides accurate approximations to the eigenfunctions of polynomial, radial basis, neural network Gaussian process, and neural tangent kernels can scale up linearised Laplace approximation of deep neural networks to modern image classification datasets through approximating the Gauss-Newton matrix.

A Simple Framework for Contrastive Learning of Visual Representations

It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

This paper introduces VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually.

Deep Contrastive Learning is Provably (almost) Principal Component Analysis

We show that Contrastive Learning (CL) under a family of loss functions (including InfoNCE) has a game-theoretical formulation, where the max player finds representation to maximize contrastiveness,

Representation Learning with Contrastive Predictive Coding

This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.

Deep Kernel Learning

We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes and shows that learned representations can reduce (labeled) sample complexity on downstream tasks.

Understanding Contrastive Learning Requires Incorporating Inductive Biases

It is demonstrated that analyses, that ignore inductive biases of the function class and training algorithm, cannot adequately explain the success of contrastive learning, even provably leading to vacuous guarantees in some settings.

Differentiable Compositional Kernel Learning for Gaussian Processes

The Neural Kernel Network (NKN), a flexible family of kernels represented by a neural network, is presented, which is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel.