• Corpus ID: 222378250

Representation Learning via Invariant Causal Mechanisms

  title={Representation Learning via Invariant Causal Mechanisms},
  author={Jovana Mitrovic and Brian McWilliams and Jacob Walker and Lars Buesing and Charles Blundell},
Self-supervised learning has emerged as a strategy to reduce the reliance on costly supervised signal by pretraining representations only using unlabeled data. These methods combine heuristic proxy classification tasks with data augmentations and have achieved significant success, but our theoretical understanding of this success remains limited. In this paper we analyze self-supervised representation learning using a causal framework. We show how data augmentations can be more effectively… 

Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?

R E LICv2 is the first unsupervised representation learning method to consistently outperform a standard supervised baseline in a like-for-like comparison across a wide range of ResNet architectures and is comparable to state-of-the-art self-supervised vision transformers.

Nonlinear Invariant Risk Minimization: A Causal Approach

Invariant Causal Representation Learning is proposed, a learning paradigm that enables out-of-distribution (OOD) generalization in the nonlinear setting (i.e., nonlinear representations and nonlinear classifiers) and builds upon a practical and general assumption: the prior over the data representation factorizes when conditioning on the target and the environment.

Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style

Causal3DIdent, a dataset of high-dimensional, visually complex images with rich causal dependencies, which is used to study the effect of data augmentations performed in practice, and numerical simulations with dependent latent variables are consistent with theory.

Regularising for invariance to data augmentation improves supervised learning

It is shown that the predictions of the best performing method are also the most similar when compared on different augmentations of the same input, and an explicit regulariser is proposed that improves generalisation and equalises performance differences between all considered objectives.

Contrastive Unsupervised Learning of World Model with Invariant Causal Features

The proposed world model uses contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation, and significantly outperforms current state-of-the-art model-based and model-free reinforcement learning methods on out- of-distribution point navigation tasks on the iGibson dataset.

Improving Self-Supervised Learning by Characterizing Idealized Representations

This work characterize properties that SSL representations should ideally satisfy and proves necessary and sufficient conditions such that for any task invariant to given data augmentations, desired probes trained on that representation attain perfect accuracy.

Out-of-distribution Generalization with Causal Invariant Transformations

Under the setting of invariant causal mechanism, it is theoretically shown that if all causal invariant transformations are available, then the authors can learn a minimax optimal model across the domains using only single domain data, and it suffices to know only a subset of these transformations.

Learning towards Robustness in Causally-Invariant Predictors

This work proposes to learn an invariant causal predictor that is robust to distributional shifts, in the supervised regression scenario, and identifies a set of invariant predictors based on the do -operator.

Domain Generalization - A Causal Perspective

This survey argues that it is possible to categorize the causal domain generalization methods into three categories based on how causality is leveraged in that method and in which part of the model pipeline is it used, and concludes with insights and discussions on future directions.

GCISG: Guided Causal Invariant Learning for Improved Syn-to-real Generalization

This paper characterize the domain gap by using a causal framework for data generation and proposes causal invariance learning which encourages the model to learn a style-invariant representation that enhances the syn-to-real generalization.



On Mutual Information Maximization for Representation Learning

This paper argues, and provides empirical evidence, that the success of these methods cannot be attributed to the properties of MI alone, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators.

Representation Learning with Contrastive Predictive Coding

This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.

Learning Robust Global Representations by Penalizing Local Predictive Power

A method for training robust convolutional networks by penalizing the predictive power of the local representations learned by earlier layers, which forces networks to discard predictive signals such as color and texture that can be gleaned from local receptive fields and to rely instead on the global structures of the image.

Revisiting Self-Supervised Visual Representation Learning

This study revisits numerous previously proposed self-supervised models, conducts a thorough large scale study and uncovers multiple crucial insights about standard recipes for CNN design that do not always translate to self- supervised representation learning.

Learning Representations by Maximizing Mutual Information Across Views

This work develops a model which learns image representations that significantly outperform prior methods on the tasks the authors consider, and extends this model to use mixture-based representations, where segmentation behaviour emerges as a natural side-effect.

Unsupervised Data Augmentation

UDA has a small twist in that it makes use of harder and more realistic noise generated by state-of-the-art data augmentation methods, which leads to substantial improvements on six language tasks and three vision tasks even when the labeled set is extremely small.

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

This paper proposes an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons, and uses a swapped prediction mechanism where it predicts the cluster assignment of a view from the representation of another view.

Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning

An unsupervised loss function is proposed that takes advantage of the stochastic nature of these methods and minimizes the difference between the predictions of multiple passes of a training sample through the network.

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes and shows that learned representations can reduce (labeled) sample complexity on downstream tasks.

What makes for good views for contrastive learning

This paper uses empirical analysis to better understand the importance of view selection, and argues that the mutual information (MI) between views should be reduced while keeping task-relevant information intact, and devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.