# Relative gradient optimization of the Jacobian term in unsupervised deep learning

@article{Gresele2020RelativeGO, title={Relative gradient optimization of the Jacobian term in unsupervised deep learning}, author={Luigi Gresele and Giancarlo Fissore and Adri{\'a}n Javaloy and Bernhard Sch{\"o}lkopf and Aapo Hyv{\"a}rinen}, journal={ArXiv}, year={2020}, volume={abs/2006.15090} }

Learning expressive probabilistic models correctly describing the data is a ubiquitous problem in machine learning. A popular approach for solving it is mapping the observations into a representation space with a simple joint distribution, which can typically be written as a product of its marginals -- thus drawing a connection with the field of nonlinear independent component analysis. Deep density models have been widely used for this task, but their likelihood-based training requires…

## 15 Citations

### Self Normalizing Flows

- Computer ScienceICML
- 2021

Self Normalizing Flows is proposed, a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer, allowing for the training of flow architectures which were otherwise computationally infeasible, while also providing efficient sampling.

### Training Neural Networks with Property-Preserving Parameter Perturbations

- Computer ScienceArXiv
- 2020

This work presents a novel, general approach of preserving matrix properties by using parameterized perturbations in lieu of directly optimizing the network parameters, and shows how such invertible blocks improve the mixing of coupling layers and thus the mode separation of the resulting normalizing flows.

### Preserving Properties of Neural Networks by Perturbative Updates

- Computer Science
- 2020

This work presents a novel, general approach to preserve network properties by using parameterized perturbations, and shows how such invertible blocks improve mode separation when applied to normalizing flows and Boltzmann generators.

### Training Invertible Linear Layers through Rank-One Perturbations.

- Computer Science
- 2020

This work presents a novel approach for training invertible linear layers by train rank-one perturbations and add them to the actual weight matrices infrequently, which allows keeping track of inverses and determinants without ever explicitly computing them.

### Hidden Markov Nonlinear ICA: Unsupervised Learning from Nonstationary Time Series

- Computer ScienceUAI
- 2020

This work combines nonlinear ICA with a Hidden Markov Model, resulting in a model where a latent state acts in place of the observed segment-index, and proves identifiability of the proposed model for a general mixing nonlinearity, such as a neural network.

### Hidden Markov Nonlinear ICA: Unsupervised Learning from Nonstationary Time Series

- Computer Science
- 2020

This work combines nonlinear ICA with a Hidden Markov Model, resulting in a model where a latent state acts in place of the observed segment index, and proves identifiability of the proposed model for a general mixing nonlinearity, such as a neural network.

### RANK-ONE PERTURBATIONS

- Computer Science
- 2021

This work presents a novel approach for training invertible linear layers by train rank-one perturbations and add them to the actual weight matrices infrequently, which allows keeping track of inverses and determinants without ever explicitly computing them.

### DA-AE: Disparity-Alleviation Auto-Encoder Towards Categorization of Heritage Images for Aggrandized 3D Reconstruction

- Computer Science, Sociology2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
- 2022

This paper proposes DA-AE for improved representation and categorization of data in latent space, along with a disparity alleviation loss, and demonstrates categorization as an event, with clustering as a downstream task.

### Probing the Robustness of Independent Mechanism Analysis for Representation Learning

- Computer ScienceArXiv
- 2022

It is shown that unregularized maximum likelihood recovers mixing functions which systematically deviate from the IMA principle, and an argument elucidating the beneﬁts of IMA-based regularization is provided.

## References

SHOWING 1-10 OF 59 REFERENCES

### NICE: Non-linear Independent Components Estimation

- Computer Science, MathematicsICLR
- 2015

We propose a deep learning framework for modeling complex high-dimensional densities called Non-linear Independent Component Estimation (NICE). It is based on the idea that a good representation is…

### High-Dimensional Probability Estimation with Deep Density Models

- Computer ScienceArXiv
- 2013

The deep density model (DDM) is introduced, a new approach to density estimation that exploits insights from deep learning to construct a bijective map to a representation space, under which the transformation of the distribution of the data is approximately factorized and has identical and known marginal densities.

### Large Scale Variational Inference and Experimental Design for Sparse Generalized Linear Models

- Computer ScienceSampling-based Optimization in the Presence of Uncertainty
- 2009

A long-standing open question about variational Bayesian inference for continuous variable models is settled, and the Gaussian lower bound relaxation is proved to be a convex optimization problem, if and only if the posterior mode is found by convex programming.

### FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models

- Computer Science, MathematicsICLR
- 2019

This paper uses Hutchinson's trace estimator to give a scalable unbiased estimate of the log-density and demonstrates the approach on high-dimensional density estimation, image generation, and variational inference, achieving the state-of-the-art among exact likelihood methods with efficient sampling.

### Invertibility of Convolutional Generative Networks from Partial Measurements

- Computer ScienceNeurIPS
- 2018

It is rigorously proved that, under some mild technical assumptions, the input of a two-layer convolutional generative network can be deduced from the network output efficiently using simple gradient descent, implying that the mapping from the low- dimensional latent space to the high-dimensional image space is bijective.

### Invertible Convolutional Flow

- Computer ScienceNeurIPS
- 2019

This work investigates a set of novel normalizing flows based on the circular and symmetric convolutions and proposes an analytic approach to designing nonlinear elementwise bijectors that induce special properties in the intermediate layers, by implicitly introducing specific regularizers in the loss.

### Variational Inference: A Review for Statisticians

- Computer ScienceArXiv
- 2016

Variational inference (VI), a method from machine learning that approximates probability densities through optimization, is reviewed and a variant that uses stochastic optimization to scale up to massive data is derived.

### i-RevNet: Deep Invertible Networks

- Computer ScienceICLR
- 2018

The i-RevNet is built, a network that can be fully inverted up to the final projection onto the classes, i.e. no information is discarded, and linear interpolations between natural image representations are reconstructed.

### Density estimation using Real NVP

- Computer ScienceICLR
- 2017

This work extends the space of probabilistic models using real-valued non-volume preserving (real NVP) transformations, a set of powerful invertible and learnable transformations, resulting in an unsupervised learning algorithm with exact log-likelihood computation, exact sampling, exact inference of latent variables, and an interpretable latent space.

### Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning

- Computer ScienceAISTATS
- 2019

This work provides a comprehensive proof of the identifiability of the model as well as the consistency of the estimation method, and proposes to learn nonlinear ICA by discriminating between true augmented data, or data in which the auxiliary variable has been randomized.