Corpus ID: 49670925

Representation Learning with Contrastive Predictive Coding

@article{Oord2018RepresentationLW,
  title={Representation Learning with Contrastive Predictive Coding},
  author={A{\"a}ron van den Oord and Yazhe Li and Oriol Vinyals},
  journal={ArXiv},
  year={2018},
  volume={abs/1807.03748}
}
While supervised learning has enabled great progress in many applications, unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. In this work, we propose a universal unsupervised learning approach to extract useful representations from high-dimensional data, which we call Contrastive Predictive Coding. The key insight of our model is to learn such representations by predicting the future in latent space by… 
Hybrid Generative-Contrastive Representation Learning
TLDR
It is demonstrated that a transformer-based encoder-decoder architecture trained with both contrastive and generative losses can learn highly discriminative and robust representations without hurting the generative performance.
Unsupervised State Representation Learning in Atari
TLDR
This work introduces a method that learns state representations by maximizing mutual information across spatially and temporally distinct features of a neural encoder of the observations and introduces a new benchmark based on Atari 2600 games to evaluate representations based on how well they capture the ground truth state variables.
Function Contrastive Learning of Transferable Representations
TLDR
This work proposes a contrastive learning method which is not trained to solve a set of tasks, but rather attempts to find a good representation of the underlying data-generating processes which allows for finding representations which are useful for an entire series of tasks sharing the same function.
Contrasting Contrastive Self-Supervised Representation Learning Models
TLDR
This paper analyzes contrastive approaches as one of the most successful and popular variants of self-supervised representation learning and examines over 700 training experiments including 30 encoders, 4 pre-training datasets and 20 diverse downstream tasks.
Rethinking Image Mixture for Unsupervised Visual Representation Learning
TLDR
Despite its conceptual simplicity, it is shown empirically that with the simple solution -- image mixture, the authors can learn more robust visual representations from the transformed input, and the benefits of representations learned from this space can be inherited by the linear classification and downstream tasks.
i-Mix: A Strategy for Regularizing Contrastive Representation Learning
TLDR
It is demonstrated that i-Mix consistently improves the quality of self-supervised representations across domains, resulting in significant performance gains on downstream tasks, and its regularization effect is confirmed via extensive ablation studies across model and dataset sizes.
NOVELTY DETECTION VIA ROTATED CONTRASTIVE PREDICTIVE CODING
  • 2020
The current dominant paradigm for novelty detection relies on a learned model’s capability to recover the regularities. To this end, reconstruction-based learning is often used in which the normality
Return-Based Contrastive Representation Learning for Reinforcement Learning
TLDR
This work proposes a novel auxiliary task that forces the learnt representations to discriminate state-action pairs with different returns, and achieves even better performance when combined with existing auxiliary tasks.
Data-Efficient Reinforcement Learning with Self-Predictive Representations
TLDR
The method, Self-Predictive Representations (SPR), trains an agent to predict its own latent state representations multiple steps into the future using an encoder which is an exponential moving average of the agent’s parameters and a learned transition model.
Data-Efficient Reinforcement Learning with Momentum Predictive Representations
TLDR
This work trains an agent to predict its own latent state representations multiple steps into the future using an encoder which is an exponential moving average of the agent's parameters, and makes predictions using a learned transition model.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 54 REFERENCES
Adversarial Feature Learning
TLDR
Bidirectional Generative Adversarial Networks are proposed as a means of learning the inverse mapping of GANs, and it is demonstrated that the resulting learned feature representation is useful for auxiliary supervised discrimination tasks, competitive with contemporary approaches to unsupervised and self-supervised feature learning.
Unsupervised Visual Representation Learning by Context Prediction
TLDR
It is demonstrated that the feature representation learned using this within-image context indeed captures visual similarity across images and allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset.
Multi-task Self-Supervised Visual Learning
TLDR
The results show that deeper networks work better, and that combining tasks—even via a na¨ýve multihead architecture—always improves performance.
Unsupervised Learning of Visual Representations Using Videos
  • X. Wang, A. Gupta
  • Computer Science
    2015 IEEE International Conference on Computer Vision (ICCV)
  • 2015
TLDR
A simple yet surprisingly powerful approach for unsupervised learning of CNN that uses hundreds of thousands of unlabeled videos from the web to learn visual representations and designs a Siamese-triplet network with a ranking loss function to train this CNN representation.
Time-Contrastive Networks: Self-Supervised Learning from Multi-view Observation
TLDR
The first self-supervised results for end-to-end imitation learning of human motions with a real robot are shown, and the contrastive signal encourages the model to discover meaningful dimensions and attributes that can explain the changing state of objects and the world from visually similar frames.
Time-Contrastive Networks: Self-Supervised Learning from Video
TLDR
A self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints is proposed, and it is demonstrated that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be use as a reward function within a reinforcement learning algorithm.
Show and tell: A neural image caption generator
TLDR
This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image.
Skip-Thought Vectors
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the
Identity Mappings in Deep Residual Networks
TLDR
The propagation formulations behind the residual building blocks suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.
Conditional Image Generation with PixelCNN Decoders
TLDR
The gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-of-the-art performance of PixelRNN on ImageNet, with greatly reduced computational cost.
...
1
2
3
4
5
...