• Corpus ID: 235605976

Bayesian Deep Learning Hyperparameter Search for Robust Function Mapping to Polynomials with Noise

@article{Harilal2021BayesianDL,
  title={Bayesian Deep Learning Hyperparameter Search for Robust Function Mapping to Polynomials with Noise},
  author={Nidhin Harilal and Udit Bhatia and Auroop Ratan Ganguly},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.12532}
}
Advances in neural architecture search, as well as explainability and interpretability of connectionist architectures, have been reported in the recent literature. However, our understanding of how to design Bayesian Deep Learning (BDL) hyperparameters, specifically, the depth, width and ensemble size, for robust function mapping with uncertainty quantification, is still emerging. This paper attempts to further our understanding by mapping Bayesian connectionist representations to polynomials… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 63 REFERENCES

Bayesian deep learning with hierarchical prior: Predictions from limited and noisy data

A General Framework for Uncertainty Estimation in Deep Learning

This work proposes a novel framework for uncertainty estimation of neural networks, based on Bayesian belief networks and Monte-Carlo sampling, which outperform previous methods by up to 23% in accuracy and has several desirable properties.

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

A new theoretical framework is developed casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes, which mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy.

Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam

New natural-gradient algorithms to reduce efforts for Gaussian mean-field VI by perturbing the network weights during gradient evaluations, and uncertainty estimates can be cheaply obtained by using the vector that adapts the learning rate.

Subspace Inference for Bayesian Deep Learning

Low-dimensional subspaces of parameter space, such as the first principal components of the stochastic gradient descent (SGD) trajectory, are constructed, which contain diverse sets of high performing models and show that Bayesian model averaging over the induced posterior produces accurate predictions and well calibrated predictive uncertainty for both regression and image classification.

Quantifying Uncertainty in Discrete-Continuous and Skewed Data with Bayesian Deep Learning

A discrete-continuous BDL model with Gaussian and lognormal likelihoods for uncertainty quantification (UQ) in climate applied to precipitation, which is the first UQ model in SD where both aleatoric and epistemic uncertainties are characterized.

Importance Estimation for Neural Network Pruning

A novel method that estimates the contribution of a neuron (filter) to the final loss and iteratively removes those with smaller scores and two variations of this method using the first and second-order Taylor expansions to approximate a filter's contribution are described.

Concrete Dropout

This work proposes a new dropout variant which gives improved performance and better calibrated uncertainties, and uses a continuous relaxation of dropout’s discrete masks to allow for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles.

Geometry of energy landscapes and the optimizability of deep neural networks

This work analytically shows that the multilayered structure holds the key to optimizability: Fixing the number of parameters and increasing network depth, theNumber of stationary points in the loss function decreases, minima become more clustered in parameter space, and the trade-off between the depth and width of minima becomes less severe.

On the Expressive Power of Deep Learning: A Tensor Analysis

It is proved that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require exponential size in order to be realized (or even approximated) by a shallow network.
...