• Corpus ID: 18720237

Understanding symmetries in deep networks

@article{Badrinarayanan2015UnderstandingSI,
  title={Understanding symmetries in deep networks},
  author={Vijay Badrinarayanan and Bamdev Mishra and Roberto Cipolla},
  journal={ArXiv},
  year={2015},
  volume={abs/1511.01029}
}
Recent works have highlighted scale invariance or symmetry present in the weight space of a typical deep network and the adverse effect it has on the Euclidean gradient based stochastic gradient descent optimization. In this work, we show that a commonly used deep network, which uses convolution, batch normalization, reLU, max-pooling, and sub-sampling pipeline, possess more complex forms of symmetry arising from scaling-based reparameterization of the network weights. We propose to tackle the… 

Figures and Tables from this paper

Symmetry-invariant optimization in deep networks
TLDR
This work shows that commonly used deep networks, such as those which use a max-pooling and sub-sampling layer, possess more complex forms of symmetry arising from scaling based reparameterization of the network weights.
Riemannian approach to batch normalization
TLDR
This work proposes intuitive and effective gradient clipping and regularization methods for the proposed algorithm by utilizing the geometry of the Riemannian manifold, which provides a new learning rule that is more efficient and easier to analyze.
2 Saddles in Deep Networks : Background and Motivation
TLDR
This work proposes a new hypothesis based on recent theoretical findings and empirical studies that deep neural network models actually converge to saddle points with high degeneracy, and verified the famous Wigner’s Semicircle Law in the experimental results.
Sharp Minima Can Generalize For Deep Nets
TLDR
It is argued that most notions of flatness are problematic for deep models and can not be directly applied to explain generalization, and when focusing on deep networks with rectifier units, the particular geometry of parameter space induced by the inherent symmetries that these architectures exhibit is exploited.
Improving Optimization in Models With Continuous Symmetry Breaking
TLDR
This work uses tools from gauge theory in physics to design an optimization algorithm that solves the slow convergence problem of representation learning, and leads to a fast decay of Goldstone modes, to orders of magnitude faster convergence, and to more interpretable representations.
Are Saddles Good Enough for Deep Learning?
TLDR
This work proposes a new hypothesis based on recent theoretical findings and empirical studies that deep neural network models actually converge to saddle points with high degeneracy, and verified the famous Wigner's Semicircle Law in the experimental results.
On Stein Variational Neural Network Ensembles
TLDR
It is found that SVGD using functional and hybrid kernels can overcome the limitations of deep ensembles and improves on functional diversity and uncertainty estimation and approaches the true Bayesian posterior more closely.
On the Symmetries of Deep Learning Models and their Internal Representations
TLDR
This paper seeks to connect the symmetries arising from the architecture of a family of models with the asymmetries of that family’s internal representation of data, by calculating a set of fundamental symmetry groups, which are called intertwiner groups of the model.
Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks
TLDR
A theoretical framework is developed to study the geometry of learning dynamics in neural networks, and a key mechanism of explicit symmetry breaking is revealed behind the efficiency and stability of modern neural networks.
Are saddles good enough for neural networks
TLDR
This work proposes a new hypothesis based on recent theoretical findings and empirical studies that neural network models actually converge to saddle points with high degeneracy, and verified the famous Wigner's Semicircle Law in experimental results.
...
...

References

SHOWING 1-10 OF 23 REFERENCES
Revisiting Natural Gradient for Deep Networks
TLDR
It is described how one can use unlabeled data to improve the generalization error obtained by natural gradient and empirically evaluate the robustness of the algorithm to the ordering of the training set compared to stochastic gradient descent.
Natural Neural Networks
TLDR
A specific example that employs a simple and efficient reparametrization of the neural network weights by implicitly whitening the representation obtained at each layer, while preserving the feed-forward computation of the network.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Path-SGD: Path-Normalized Optimization in Deep Neural Networks
TLDR
This work revisits the choice of SGD for training deep neural networks by reconsidering the appropriate geometry in which to optimize the weights, and suggests Path-SGD, which is an approximate steepest descent method with respect to a path-wise regularizer related to max-norm regularization.
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling
TLDR
The results show that SegNet achieves state-of-the-art performance even without use of additional cues such as depth, video frames or post-processing with CRF models.
Riemannian metrics for neural networks I: feedforward networks
TLDR
Four algorithms for neural network training are described, each adapted to different scalability constraints, based on either the natural gradient using the Fisher information matrix, or on Hessian methods, scaled down in a specific way to allow for scalability while keeping some of their key mathematical properties.
Riemannian metrics for neural networks II: recurrent networks and learning symbolic data sequences
TLDR
A training procedure using a gradient ascent in a Riemannian metric is introduced, which produces an algorithm independent from design choices such as the encoding of parameters and unit activities for sparsely connected networks.
Manopt, a matlab toolbox for optimization on manifolds
TLDR
The Manopt toolbox, available at www.manopt.org, is a user-friendly, documented piece of software dedicated to simplify experimenting with state of the art Riemannian optimization algorithms, which aims particularly at lowering the entrance barrier.
Natural Gradient Works Efficiently in Learning
  • S. Amari
  • Computer Science
    Neural Computation
  • 1998
TLDR
The dynamical behavior of natural gradient online learning is analyzed and is proved to be Fisher efficient, implying that it has asymptotically the same performance as the optimal batch estimation of parameters.
...
...