• Corpus ID: 230770462

A unified view for unsupervised representation learning with density ratio estimation: Maximization of mutual information, nonlinear ICA and nonlinear subspace estimation

  title={A unified view for unsupervised representation learning with density ratio estimation: Maximization of mutual information, nonlinear ICA and nonlinear subspace estimation},
  author={Hiroaki Sasaki and Takashi Takenouchi},
Unsupervised representation learning is one of the most important problems in machine learning. Recent promising methods are based on contrastive learning. However, contrastive learning often relies on heuristic ideas, and therefore it is not easy to understand what contrastive learning is doing. This paper emphasizes that density ratio estimation is a promising goal for unsupervised representation learning, and promotes understanding to contrastive learning. Our primal contribution is to… 

Figures from this paper


On Mutual Information Maximization for Representation Learning
This paper argues, and provides empirical evidence, that the success of these methods cannot be attributed to the properties of MI alone, and that they strongly depend on the inductive bias in both the choice of feature extractor architectures and the parametrization of the employed MI estimators.
Robust contrastive learning and nonlinear ICA in the presence of outliers
This paper develops two robust nonlinear ICA methods based on the {\gamma}-divergence, which is a robust alternative to the KL-Divergence in logistic regression and is applied to ICA-based causal discovery and shown to find a plausible causal relationship on fMRI data.
Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning
This work provides a comprehensive proof of the identifiability of the model as well as the consistency of the estimation method, and proposes to learn nonlinear ICA by discriminating between true augmented data, or data in which the auxiliary variable has been randomized.
Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces
A novel method of dimensionality reduction for supervised learning problems that requires neither assumptions on the marginal distribution of X, nor a parametric model of the conditional distribution of Y, and establishes a general nonparametric characterization of conditional independence using covariance operators on reproducing kernel Hilbert spaces.
Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA
This work proposes a new intuitive principle of unsupervised deep learning from time series which uses the nonstationary structure of the data, and shows how TCL can be related to a nonlinear ICA model, when ICA is redefined to include temporal nonstationarities.
Representation Learning: A Review and New Perspectives
Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
Learning Representations by Maximizing Mutual Information Across Views
This work develops a model which learns image representations that significantly outperform prior methods on the tasks the authors consider, and extends this model to use mixture-based representations, where segmentation behaviour emerges as a natural side-effect.
A Survey of Multi-View Representation Learning
This survey aims to provide an insightful overview of theoretical foundation and state-of-the-art developments in the field of multi-view representation learning and to help researchers find the most appropriate tools for particular applications.
Learning deep representations by mutual information estimation and maximization
It is shown that structure matters: incorporating knowledge about locality in the input into the objective can significantly improve a representation’s suitability for downstream tasks and is an important step towards flexible formulations of representation learning objectives for specific end-goals.
An Information-Maximization Approach to Blind Separation and Blind Deconvolution
It is suggested that information maximization provides a unifying framework for problems in "blind" signal processing and dependencies of information transfer on time delays are derived.