Representation Learning: A Review and New Perspectives

  title={Representation Learning: A Review and New Perspectives},
  author={Yoshua Bengio and Aaron C. Courville and Pascal Vincent},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors… 

Figures from this paper

A Modular Theory of Feature Learning
The idea of a risk gap induced by representation learning for a given prediction context, which measures the difference in the risk of some learner using the learned features as compared to the original inputs, is proposed.
Sparse, hierarchical and shared-factors priors for representation learning
A series of contributions aimed at improving the quality of the learned representations of Sparse Dictionary Learning approaches on the problem of grasp detection and an empirical analysis of their advantages and disadvantages are presented.
The Role of the Information Bottleneck in Representation Learning
This work derives an upper bound to the so-called generalization gap corresponding to the cross-entropy loss and shows that when this bound times a suitable multiplier and the empirical risk are minimized jointly, the problem is equivalent to optimizing the Information Bottleneck objective with respect to the empirical data-distribution.
For Manifold Learning, Deep Neural Networks can be Locality Sensitive Hash Functions
It is provided theoretical and empirical evidence that neural representations can be viewed as LSH-like functions that map each input to an embedding that is a function of solely the informative γ and invariant to θ, effectively recovering the manifold identifier γ.
Unsupervised Learning Under Uncertainty
Two methods to address the problem of video prediction are introduced, first using a novel form of linearizing auto-encoder and latent variables, and secondly using Generative Adversarial Networks (GANs), to show how GANs can be seen as trainable loss functions to represent uncertainty, then how they can be used to disentangle factors of variation.
Learning deep representations by mutual information estimation and maximization
It is shown that structure matters: incorporating knowledge about locality in the input into the objective can significantly improve a representation’s suitability for downstream tasks and is an important step towards flexible formulations of representation learning objectives for specific end-goals.
Autonomous Learning of Representations
The goal of this contribution is to give an overview about different principles of autonomous feature learning, and to exemplify two principles based on two recent examples: autonomous metric learning for sequences, and autonomous learning of a deep representation for spoken language, respectively.
Relation-Guided Representation Learning


Deep Learning of Representations for Unsupervised and Transfer Learning
  • Yoshua Bengio
  • Computer Science
    ICML Unsupervised and Transfer Learning
  • 2012
Why unsupervised pre-training of representations can be useful, and how it can be exploited in the transfer learning scenario, where the authors care about predictions on examples that are not from the same distribution as the training distribution.
Unsupervised and Transfer Learning Challenge: a Deep Learning Approach
This paper describes different kinds of layers the authors trained for learning representations in the setting of the Unsupervised and Transfer Learning Challenge, and the particular one-layer learning algorithms feeding a simple linear classifier with a tiny number of labeled training samples.
Sparse Feature Learning for Deep Belief Networks
This work proposes a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation, and describes a novel and efficient algorithm to learn sparse representations.
Extracting and composing robust features with denoising autoencoders
This work introduces and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern.
The Manifold Tangent Classifier
A representation learning algorithm can be stacked to yield a deep architecture and it is shown how it builds a topological atlas of charts, each chart being characterized by the principal singular vectors of the Jacobian of a representation mapping.
Why Does Unsupervised Pre-training Help Deep Learning?
The results suggest that unsupervised pre-training guides the learning towards basins of attraction of minima that support better generalization from the training data set; the evidence from these results supports a regularization explanation for the effect of pre- training.
Understanding Representations Learned in Deep Architectures
It is shown that consistent filter-like interpretation is possible and simple to accomp lish at the unit level and it is hoped that such techniques will allow researchers in deep architectures to unde rstand more of how and why deep architectures work.
Large-Scale Learning of Embeddings with Reconstruction Sampling
A novel method to speed up the learning of embeddings for large-scale learning tasks involving very sparse data, as is typically the case for Natural Language Processing tasks, using a new method to approximate reconstruction error by a sampling procedure.
A Generative Process for sampling Contractive Auto-Encoders
A procedure for generating samples that are consistent with the local structure captured by a contractive auto-encoder and which experimentally appears to converge quickly and mix well between modes, compared to Restricted Boltzmann Machines and Deep Belief Networks is proposed.
On deep generative models with applications to recognition
This work uses one of the best, pixel-level, generative models of natural images–a gated MRF–as the lowest level of a deep belief network that has several hidden layers and shows that the resulting DBN is very good at coping with occlusion when predicting expression categories from face images.