• Corpus ID: 235652395

Decomposed Mutual Information Estimation for Contrastive Representation Learning

@article{Sordoni2021DecomposedMI,
  title={Decomposed Mutual Information Estimation for Contrastive Representation Learning},
  author={Alessandro Sordoni and Nouha Dziri and Hannes Schulz and Geoffrey J. Gordon and Philip Bachman and R{\'e}mi Tachet des Combes},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.13401}
}
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation… 

Figures and Tables from this paper

miCSE: Mutual Information Contrastive Learning for Low-shot Sentence Embeddings

This paper presents mi CSE , a mutual information-based Contrastive learning framework that significantly advances the state-of-the-art in few-shot sentence embedding. The proposed approach imposes

Self-Contrastive Learning: An Efficient Supervised Contrastive Framework with Single-view and Sub-network

An efficient supervised contrastive learning framework that selfcontrasts within multiple outputs from the different levels of a multi-exit network is proposed, called SelfContrastive (SelfCon) learning, which proves the MI bound for SelfCon loss in a supervised and single-viewed framework.

Interventional Contrastive Learning with Meta Semantic Regularizer

A backdoor adjustment-based regularization method, namely Interventional Contrastive Learning with Meta Semantic Regularizer (ICL-MSR), to perform causal intervention towards the proposed Structural Causal Model (SCM) to model the background as a confounder.

Rethinking the Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance with Expanded Views

  • Junbo ZhangKaisheng Ma
  • Computer Science
    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2022
This paper proposes to learn different aug-mentation invariances at different depths of the model according to the importance of each data augmentation in-stead of learning representational invariansces evenly in the backbone, and expands the contrast content with augmentation embeddings to reduce the misleading ef-fects of strong data augmentations.

Online Continual Learning through Mutual Information Maximization

This paper proposes a new online continual learning technique called OCM based on mutual information maximization that substantially outperforms the online CL baselines and encourages preservation of the previously learned knowledge when training a new batch of incrementally arriving data.

Self-Distilled Self-Supervised Representation Learning

The method, Self-Distilled Self-Supervised Learning (SDSSL), outperforms competitive baselines (SimCLR, BYOL and MoCo v3) using ViT on various tasks and datasets and not only leads to superior performance in the final layers, but also in most of the lower layers.

Conditional Contrastive Learning for Improving Fairness in Self-Supervised Learning

This paper empirically demonstrates that the proposed Conditional Contrastive Learning approach achieves state-of-the-art downstream performances compared to unsupervised baselines and improves the fairness of contrastive SSL models on multiple fairness metrics.

Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning

The theoretical analysis shows that DOMINO can overcome the underestimation of the mutual information caused by multi-confounded challenges via learning disentangled context and reduce the demand for the number of samples collected in various environments.

Mutual Information-guided Knowledge Transfer for Novel Class Discovery

The proposed method to transfer semantic knowledge between seen and unseen classes outperforms previous SOTA by a significant margin on several benchmarks and maximizing mutual information promotes transferring semantic knowledge.

References

SHOWING 1-10 OF 58 REFERENCES

On Mutual Information Maximization for Representation Learning

This paper argues, and provides empirical evidence, that the success of these methods cannot be attributed to the properties of MI alone, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators.

Learning Representations by Maximizing Mutual Information Across Views

This work develops a model which learns image representations that significantly outperform prior methods on the tasks the authors consider, and extends this model to use mixture-based representations, where segmentation behaviour emerges as a natural side-effect.

Multi-label Contrastive Predictive Coding

This work introduces a novel estimator based on a multi-label classification problem, where the critic needs to jointly identify multiple positive samples at the same time and shows that using the same amount of negative samples, multi- label CPC is able to exceed the $\log m$ bound, while still being a valid lower bound of mutual information.

What makes for good views for contrastive learning

This paper uses empirical analysis to better understand the importance of view selection, and argues that the mutual information (MI) between views should be reduced while keeping task-relevant information intact, and devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

This paper proposes an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons, and uses a swapped prediction mechanism where it predicts the cluster assignment of a view from the representation of another view.

Conditional Noise-Contrastive Estimation of Unnormalised Models

The proposed method shares with NCE the idea of formulating density estimation as a supervised learning problem but in contrast to NCE, the proposed method leverages the observed data when generating noise samples, and can thus be generated in a semi-automated manner.

A Simple Framework for Contrastive Learning of Visual Representations

It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.

Learning deep representations by mutual information estimation and maximization

It is shown that structure matters: incorporating knowledge about locality in the input into the objective can significantly improve a representation’s suitability for downstream tasks and is an important step towards flexible formulations of representation learning objectives for specific end-goals.

C-MI-GAN : Estimation of Conditional Mutual Information using MinMax formulation

This work focuses on conditional mutual information estimation by utilizing its formulation as a minmax optimization problem, and finds that the proposed estimator provides better estimates than the existing approaches on a variety of simulated data sets comprising linear and non-linear relations between variables.

Representation Learning with Contrastive Predictive Coding

This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.
...