# Variational Dropout via Empirical Bayes

@article{Kharitonov2018VariationalDV, title={Variational Dropout via Empirical Bayes}, author={Valery Kharitonov and Dmitry Molchanov and Dmitry P. Vetrov}, journal={ArXiv}, year={2018}, volume={abs/1811.00596} }

We study the Automatic Relevance Determination procedure applied to deep neural networks. We show that ARD applied to Bayesian DNNs with Gaussian approximate posterior distributions leads to a variational bound similar to that of variational dropout, and in the case of a fixed dropout rate, objectives are exactly the same. Experimental results show that the two approaches yield comparable results in practice even when the dropout rates are trained. This leads to an alternative Bayesian…

## Tables and Topics from this paper

## 8 Citations

Structured Dropout Variational Inference for Bayesian Neural Networks

- Computer Science, Mathematics
- 2021

This work focuses on the inflexibility of the factorized structure in Dropout posterior and proposes an improved method called Variational Structured Dropout (VSD), which employs an orthogonal transformation to learn a structured representation on the variational noise and consequently induces statistical dependencies in the approximate posterior.

Improving Bayesian Inference in Deep Neural Networks with Variational Structured Dropout

- Computer ScienceArXiv
- 2021

This work focuses on restrictions of the factorized structure of Dropout posterior which is inflexible to capture rich correlations among weight parameters of the true posterior, and proposes a novel method called Variational Structured Dropout (VSD) to overcome this limitation.

Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

- Computer ScienceRepL4NLP@ACL
- 2019

This article proposes an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression and finds this method to be especially useful in language modeling tasks, where large number of parameters in the input and output layers is often excessive.

Adaptive Neural Connections for Sparsity Learning

- Computer Science2020 IEEE Winter Conference on Applications of Computer Vision (WACV)
- 2020

Adaptive Neural Connections is proposed, a method for explicitly parameterizing fine-grained neuron-to-neuron connections via adjacency matrices at each layer that are learned through backpropagation that shows that architectures augmented with ANC outperform their vanilla counterparts.

Bayesian Sparsification of Deep C-valued Networks

- Computer ScienceICML
- 2020

Sparse Variational Dropout is extended to complex-valued neural networks and the proposed Bayesian technique is verified by conducting a large numerical study of the performancecompression trade-off of C-valued networks on two tasks.

Implementation of DNNs on IoT devices

- Computer ScienceNeural Computing and Applications
- 2019

This paper presents a comprehensive review on hardware-and-software-co-design approaches developed to implement DNNs on low-resource hardware platforms and can guide the design and implementation of the next generation of hardware and software solutions for real-world IoT applications.

A Bayesian multiscale CNN framework to predict local stress fields in structures with microscale features

- Computer ScienceComputational Mechanics
- 2021

This work proposes to replace the local microscale solution by an Encoder-Decoder Convolutional Neural Network that will generate fine-scale stress corrections to coarse predictions around unresolved microscale features, without prior parametrisation of local micro scale problems.

Bayesian Sparsification Methods for Deep Complex-valued Networks

- Computer Science, MathematicsArXiv
- 2020

The proposed Bayesian technique is verified by conducting a large numerical study of the performance-compression trade-off of C-valued networks on two tasks: image recognition on MNIST-like and CIFAR10 datasets and music transcription on MusicNet.

## References

SHOWING 1-10 OF 15 REFERENCES

Variational Dropout Sparsifies Deep Neural Networks

- Computer Science, MathematicsICML
- 2017

Variational Dropout is extended to the case when dropout rates are unbounded, a way to reduce the variance of the gradient estimator is proposed and first experimental results with individual drop out rates per weight are reported.

Variational Bayesian dropout: pitfalls and fixes

- Computer Science, MathematicsICML
- 2018

This work proffer Quasi-KL (QKL) divergence, a new approximate inference objective for approximation of high-dimensional distributions, and shows that motivations for variational Bernoulli dropout based on discretisation and noise have QKL as a limit.

Auto-Encoding Variational Bayes

- Mathematics, Computer ScienceICLR
- 2014

A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.

Variational Dropout and the Local Reparameterization Trick

- Computer Science, MathematicsNIPS
- 2015

The Variational dropout method is proposed, a generalization of Gaussian dropout, but with a more flexibly parameterized posterior, often leading to better generalization in stochastic gradient variational Bayes.

Variational Dropout and the Local Reparameterization Trick

- Mathematics, Computer ScienceNIPS 2015
- 2015

This work proposes variational dropout, a generalization of Gaussian dropout where the dropout rates are learned, often leading to better models, and allows inference of more flexibly parameterized posteriors.

Bayesian Compression for Deep Learning

- Computer Science, MathematicsNIPS
- 2017

This work argues that the most principled and effective way to attack the problem of compression and computational efficiency in deep learning is by adopting a Bayesian point of view, where through sparsity inducing priors the authors prune large parts of the network.

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

- Mathematics, Computer ScienceICML
- 2016

A new theoretical framework is developed casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes, which mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy.

Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks

- Computer Science
- 1995

Practical techniques based on Gaussian approximations for implementation of these powerful methods for controlling, comparing and using adaptive networks are described.

Doubly Stochastic Variational Bayes for non-Conjugate Inference

- Mathematics, Computer ScienceICML
- 2014

A simple and effective variational inference algorithm based on stochastic optimisation that can be widely applied for Bayesian non-conjugate inference in continuous parameter spaces and allows for efficient use of gradient information from the model joint density is proposed.

Dropout: a simple way to prevent neural networks from overfitting

- Computer ScienceJ. Mach. Learn. Res.
- 2014

It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.