• Corpus ID: 208222388

Efficient Approximate Inference with Walsh-Hadamard Variational Inference

@article{Rossi2019EfficientAI,
  title={Efficient Approximate Inference with Walsh-Hadamard Variational Inference},
  author={Simone Rossi and S{\'e}bastien Marmin and Maurizio Filippone},
  journal={ArXiv},
  year={2019},
  volume={abs/1912.00015}
}
Variational inference offers scalable and flexible tools to tackle intractable Bayesian inference of modern statistical models like Bayesian neural networks and Gaussian processes. For largely over-parameterized models, however, the over-regularization property of the variational objective makes the application of variational inference challenging. Inspired by the literature on kernel methods, and in particular on structured approximations of distributions of random matrices, this paper… 

References

SHOWING 1-10 OF 32 REFERENCES
Walsh-Hadamard Variational Inference for Bayesian Deep Learning
TLDR
Walsh-Hadamard Variational Inference is proposed, which uses Walsh- hadamard-based factorization strategies to reduce the parameterization and accelerate computations, thus avoiding over-regularization issues with the variational objective.
Auto-Encoding Variational Bayes
TLDR
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Noisy Natural Gradient as Variational Inference
TLDR
It is shown that natural gradient ascent with adaptive weight noise implicitly fits a variational posterior to maximize the evidence lower bound (ELBO), which allows us to train full-covariance, fully factorized, or matrix-variate Gaussian variational posteriors using noisy versions of natural gradient, Adam, and K-FAC, respectively, making it possible to scale up to modern-size ConvNets.
Variational Inference with Normalizing Flows
TLDR
It is demonstrated that the theoretical advantages of having posteriors that better match the true posterior, combined with the scalability of amortized variational approaches, provides a clear improvement in performance and applicability of variational inference.
Practical Variational Inference for Neural Networks
  • A. Graves
  • Computer Science, Mathematics
    NIPS
  • 2011
TLDR
This paper introduces an easy-to-implement stochastic variational method (or equivalently, minimum description length loss function) that can be applied to most neural networks and revisits several common regularisers from a variational perspective.
Good Initializations of Variational Bayes for Deep Models
TLDR
This work proposes a novel layer-wise initialization strategy based on Bayesian linear models that is extensively validated on regression and classification tasks, including Bayesian DeepNets and ConvNets, showing faster and better convergence compared to alternatives inspired by the literature on initializations for loss minimization.
Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors
TLDR
A variational Bayesian neural network where the parameters are governed via a probability distribution on random matrices is introduced and "pseudo-data" (Snelson & Ghahramani, 2005) is incorporated in this model, which allows for more efficient posterior sampling while maintaining the properties of the original model.
Random Feature Expansions for Deep Gaussian Processes
TLDR
A novel formulation of DGPs based on random feature expansions that is trained using stochastic variational inference and yields a practical learning framework which significantly advances the state-of-the-art in inference for DGPs, and enables accurate quantification of uncertainty.
Ladder Variational Autoencoders
TLDR
A new inference model is proposed, the Ladder Variational Autoencoder, that recursively corrects the generative distribution by a data dependent approximate likelihood in a process resembling the recently proposed Ladder Network.
Variational Dropout Sparsifies Deep Neural Networks
TLDR
Variational Dropout is extended to the case when dropout rates are unbounded, a way to reduce the variance of the gradient estimator is proposed and first experimental results with individual drop out rates per weight are reported.
...
1
2
3
4
...