• Corpus ID: 247025813

Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations

  title={Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations},
  author={Alexander Immer and Tycho F. A. van der Ouderaa and Vincent Fortuin and Gunnar Ratsch and Mark van der Wilk},
Data augmentation is commonly applied to improve performance of deep learning by enforcing the knowledge that certain transformations on the input preserve the output. Currently, the correct data augmentation is chosen by human effort and costly cross-validation, which makes it cumbersome to apply to new datasets. We develop a convenient gradient-based method for selecting the data augmentation. Our approach relies on phrasing data augmentation as an invariance in the prior distribution and… 

A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning

A principled taxonomy of the existing augmentation techniques used in visual RL and an in-depth discussion on how to better leverage augmented data in di erent scenarios are presented.

Laplacian Autoencoders for Learning Stochastic Representations

This work presents a Bayesian autoencoder for unsupervised representation learning, which is trained using a novel variational lower bound of the autoen coder evidence and an efficient way to compute its Hessian on high dimensional data that scales linearly with data size.

Relaxing Equivariance Constraints with Non-stationary Continuous Filters

This work proposes a parameter-efficient relaxation of equivariance that can effectively interpolate between a non-equivariant linear product, a strict-Equivariant convolution, and a strictly-invariant mapping, and experimentally verifies that soft equivariant leads to improved performance on CIFAR-10 and CIFar-100 image classiflcation tasks.



Learning Invariances in Neural Networks

With this simple procedure, the correct set and extent of invariances are recovered on image classification, regression, segmentation, and molecular property prediction from a large space of augmentations, on training data alone.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Practical Gauss-Newton Optimisation for Deep Learning

A side result of this work is that for piecewise linear transfer functions, the net- work objective function can have no differ- entiable local maxima, which may partially explain why such transfer functions facilitate effective optimisation.

Wide Residual Networks

This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts.

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

K-FAC is an efficient method for approximating natural gradient descent in neural networks which is based on an efficiently invertible approximation of a neural network's Fisher information matrix which is neither diagonal nor low-rank, and in some cases is completely non-sparse.

MNIST handwritten digit database

  • 2010

Scalable Marginal Likelihood Estimation for Model Selection in Deep Learning

This work presents a scalable marginal-likelihood estimation method to select both hyperparameters and network architectures, based on the training data alone, and it outperforms cross-validation and manualtuning on standard regression and image classification datasets, especially in terms of calibration and out-of-distribution detection.

Improving predictions of Bayesian neural networks via local linearization

In this paper we argue that in Bayesian deep learning, the frequently utilized generalized Gauss-Newton (GGN) approximation should be understood as a modification of the underlying probabilistic

Approximate Inference Turns Deep Networks into Gaussian Processes

This paper shows that certain Gaussian posterior approximations for Bayesian DNNs are equivalent to GP posteriors, and can obtain a GP kernel and a nonlinear feature map while training a DNN, and shows the resulting kernel is the neural tangent kernel.