# On Linear Identifiability of Learned Representations

@article{Roeder2020OnLI, title={On Linear Identifiability of Learned Representations}, author={Geoffrey Roeder and Luke Metz and Diederik P. Kingma}, journal={ArXiv}, year={2020}, volume={abs/2007.00810} }

Identifiability is a desirable property of a statistical model: it implies that the true model parameters may be estimated to any desired precision, given sufficient computational resources and data. We study identifiability in the context of representation learning: discovering nonlinear data representations that are optimal with respect to some downstream task. When parameterized as deep neural networks, such representation functions typically lack identifiability in parameter space, because…

## 24 Citations

### I Don't Need u: Identifiable Non-Linear ICA Without Side Information

- Computer Science, MathematicsArXiv
- 2021

Surprisingly, it is found side information is not necessary for algorithmic stability: using standard quantitative measures of identiﬁability, deep generative models with latent clusterings are empirically identi ﬁable to the same degree as models which rely on auxiliary labels.

### On Pitfalls of Identifiability in Unsupervised Learning. A Note on: "Desiderata for Representation Learning: A Causal Perspective"

- MathematicsArXiv
- 2022

Model identifiability is a desirable property in the context of unsupervised representation learning. In absence thereof, different models may be observationally indistinguishable while yielding…

### On Algorithmic Stability in Unsupervised Representation Learning

- Computer Science
- 2022

Surprisingly, it is found side information is not necessary for algorithmic stability: using standard quantitative measures of identiﬁability, deep generative models with latent clusterings are empirically identi ﬁable to the same degree as models which rely on auxiliary labels.

### Identifiability of deep generative models without auxiliary information

- Computer Science
- 2022

We prove identiﬁability of a broad class of deep latent variable models that (a) have universal approximation capabilities and (b) are the decoders of variational autoencoders that are commonly used…

### Identifiability of deep generative models under mixture priors without auxiliary information

- Computer ScienceArXiv
- 2022

We prove identiﬁability of a broad class of deep latent variable models that (a) have universal approximation capabilities and (b) are the decoders of variational autoencoders that are commonly used…

### Visual Representation Learning Does Not Generalize Strongly Within the Same Domain

- Computer ScienceICLR
- 2022

This paper test whether 17 unsupervised, weakly supervised, and fully supervised representation learning approaches correctly infer the generative factors of variation in simple datasets and observe that all of them struggle to learn the underlying mechanism regardless of supervision signal and architectural bias.

### Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style

- Computer ScienceNeurIPS
- 2021

Causal3DIdent, a dataset of high-dimensional, visually complex images with rich causal dependencies, which is used to study the effect of data augmentations performed in practice, and numerical simulations with dependent latent variables are consistent with theory.

### Generalized Shape Metrics on Neural Representations

- Computer ScienceNeurIPS
- 2021

This work defines a broad family of metric spaces that quantify representational dissimilarity and forms a novel metric that respects the inductive biases in convolutional layers and identifies approximate Euclidean embeddings that enable network representations to be incorporated into essentially any off-the-shelf machine learning method.

### Towards efficient representation identification in supervised learning

- Computer ScienceCLeaR
- 2022

This work analyzed the problem of disentanglement in a natural setting, where latent factors cause the labels, a setting not well studied in the ICA literature and shows that if ERM is constrained to learn independent representations, then it can have latent recovery from learnt representations even when the number of tasks is small.

### Disentanglement via Mechanism Sparsity Regularization: A New Principle for Nonlinear ICA

- Computer ScienceCLeaR
- 2022

A rigorous identifiability theory is developed, building on recent nonlinear independent component analysis (ICA) results, that shows how the latent variables can be recovered up to permutation if one regularizes the latent mechanisms to be sparse and if some graph connectivity criterion is satisfied by the data generating process.

## References

SHOWING 1-10 OF 44 REFERENCES

### Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning

- Computer ScienceAISTATS
- 2019

This work provides a comprehensive proof of the identifiability of the model as well as the consistency of the estimation method, and proposes to learn nonlinear ICA by discriminating between true augmented data, or data in which the auxiliary variable has been randomized.

### ICE-BeeM: Identifiable Conditional Energy-Based Deep Models

- Computer ScienceNeurIPS
- 2020

This paper establishes sufficient conditions under which a large family of conditional energy-based models is identifiable in function space, up to a simple transformation, and proposes the framework of independently modulated component analysis (IMCA), a new form of nonlinear ICA where the indepencence assumption is relaxed.

### Variational Autoencoders and Nonlinear ICA: A Unifying Framework

- Computer ScienceAISTATS
- 2020

This work shows that for a broad family of deep latent-variable models, identification of the true joint distribution over observed and latent variables is actually possible up to very simple transformations, thus achieving a principled and powerful form of disentanglement.

### Disentanglement by Nonlinear ICA with General Incompressible-flow Networks (GIN)

- Computer ScienceICLR
- 2020

This work generalizes the theory to the case of unknown intrinsic problem dimension and proves that in some special (but not very restrictive) cases, informative latent variables will be automatically separated from noise by an estimating model.

### Representation Learning with Contrastive Predictive Coding

- Computer ScienceArXiv
- 2018

This work proposes a universal unsupervised learning approach to extract useful representations from high-dimensional data, which it calls Contrastive Predictive Coding, and demonstrates that the approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.

### Adam: A Method for Stochastic Optimization

- Computer ScienceICLR
- 2015

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

### SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability

- Computer ScienceNIPS
- 2017

We propose a new technique, Singular Vector Canonical Correlation Analysis (SVCCA), a tool for quickly comparing two representations in a way that is both invariant to affine transform (allowing…

### Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA

- Computer ScienceNIPS
- 2016

This work proposes a new intuitive principle of unsupervised deep learning from time series which uses the nonstationary structure of the data, and shows how TCL can be related to a nonlinear ICA model, when ICA is redefined to include temporal nonstationarities.

### Extracting and composing robust features with denoising autoencoders

- Computer ScienceICML '08
- 2008

This work introduces and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern.

### Learning Multiple Layers of Features from Tiny Images

- Computer Science
- 2009

It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.