• Corpus ID: 220495771

Fisher Auto-Encoders

  title={Fisher Auto-Encoders},
  author={Khalil Elkhalil and Ali Hasan and Jie Ding and Sina Farsiu and Vahid Tarokh},
  booktitle={International Conference on Artificial Intelligence and Statistics},
It has been conjectured that the Fisher divergence is more robust to model uncertainty than the conventional Kullback-Leibler (KL) divergence. This motivates the design of a new class of robust generative auto-encoders (AE) referred to as Fisher auto-encoders. Our approach is to design Fisher AEs by minimizing the Fisher divergence between the intractable joint distribution of observed data and latent variables, with that of the postulated/modeled joint distribution. In contrast to KL-based… 

Figures and Tables from this paper

On the failure of variational score matching for VAE models

A critical study of existing variational SM objectives is presented, showing catastrophic failure on a wide range of datasets and network architectures and suggesting that only ELBO and the baseline objective robustly produce expected results, while previously proposed SM methods do not.

Skewed Jensen—Fisher Divergence and Its Bounds

A skewed Jensen-Fisher divergence is introduced based on relative Fisher information, and some bounds are provided in terms of the skewed Jensen–Shannon divergence and of the variational distance.

Probabilistic Autoencoder Using Fisher Information

In this work, an extension to the autoencoder architecture is introduced, the FisherNet, which has advantages from a theoretical point of view as it provides a direct uncertainty quantification derived from the model and also accounts for uncertainty cross-correlations.

Fast approximations of the Jeffreys divergence between univariate Gaussian mixture models via exponential polynomial densities

This work proposes a simple yet fast heuristic to approximate the Jeffreys divergence between two GMMs of arbitrary number of components and considers Polynomial Exponential Densities, and designs a goodness-of-fit criterion to measure the dissimilarity between a GMM and a PED which is a generalization of the Hyvärinen divergence.

Fast Approximations of the Jeffreys Divergence between Univariate Gaussian Mixtures via Mixture Conversions to Exponential-Polynomial Distributions

This paper proposes a simple yet fast heuristic to approximate the Jeffreys divergence between two univariate Gaussian mixtures with arbitrary number of components and demonstrates that this heuristic improves over the computational time of stochastic Monte Carlo estimations by several orders of magnitude.



A Connection Between Score Matching and Denoising Autoencoders

A proper probabilistic model for the denoising autoencoder technique is defined, which makes it in principle possible to sample from them or rank examples by their energy, and a different way to apply score matching that is related to learning to denoise and does not require computing second derivatives is suggested.

Contractive Auto-Encoders: Explicit Invariance During Feature Extraction

It is found empirically that this penalty helps to carve a representation that better captures the local directions of variation dictated by the data, corresponding to a lower-dimensional non-linear manifold, while being more invariant to the vast majority of directions orthogonal to the manifold.

Auto-Encoding Variational Bayes

A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

Generative Adversarial Nets

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Gradient Information for Representation and Modeling

This paper presents a new set of information quantities which they are referred to as gradient information, and applies these measures to the Chow-Liu tree algorithm, and demonstrates remarkable performance and significant computational reduction using both synthetic and real data.

Deep Learning Face Attributes in the Wild

A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.

Estimation of Non-Normalized Statistical Models by Score Matching

While the estimation of the gradient of log-density function is, in principle, a very difficult non-parametric problem, it is proved a surprising result that gives a simple formula that simplifies to a sample average of a sum of some derivatives of the log- density given by the model.

Bayesian Model Comparison with the Hyvärinen Score: Computation and Consistency

A method to consistently estimate the Hyvärinen score, a difference of out-of-sample predictive scores under the logarithmic scoring rule, is proposed for parametric models, using sequential Monte Carlo methods and it is shown that this score can be estimated for models with tractable likelihoods as well as nonlinear non-Gaussian state-space models with intractablelihoods.

Graphical Models, Exponential Families, and Variational Inference

The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.