# Invariance reduces Variance: Understanding Data Augmentation in Deep Learning and Beyond

@article{Chen2019InvarianceRV, title={Invariance reduces Variance: Understanding Data Augmentation in Deep Learning and Beyond}, author={Shuxiao Chen and Edgar Dobriban and Jane H Lee}, journal={ArXiv}, year={2019}, volume={abs/1907.10905} }

Many complex deep learning models have found success by exploiting symmetries in data. Convolutional neural networks (CNNs), for example, are ubiquitous in image classification due to their use of translation symmetry, as image identity is roughly invariant to translations. In addition, many other forms of symmetry such as rotation, scale, and color shift are commonly used via data augmentation: the transformed images are added to the training set. However, a clear framework for understanding… Expand

#### Figures, Tables, and Topics from this paper

#### 21 Citations

On the Benefits of Invariance in Neural Networks

- Computer Science, Mathematics
- ArXiv
- 2020

It is proved that training with data augmentation leads to better estimates of risk and gradients thereof, and a PAC-Bayes generalization bound is provided for models trained withData augmentation, it is shown that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC- Bayes bounds. Expand

On the Generalization Effects of Linear Transformations in Data Augmentation

- Computer Science, Mathematics
- ICML
- 2020

This work considers a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting, and proposes an augmentation scheme that searches over the space of transformations by how uncertain the model is about the transformed data. Expand

WeMix: How to Better Utilize Data Augmentation

- Computer Science, Mathematics
- ArXiv
- 2020

This work develops two novel algorithms, termed "AugDrop" and "MixLoss", to correct the data bias in the data augmentation, and proposes a generic algorithm "WeMix" by combining AugDrop and MixLoss, whose effectiveness is observed from extensive empirical evaluations. Expand

Data augmentation instead of explicit regularization

- Computer Science
- ArXiv
- 2018

The contribution on generalization of weight decay and dropout is not only superfluous when sufficient implicit regularization is provided, but also such techniques can dramatically deteriorate the performance if the hyperparameters are not carefully tuned for the architecture and data set. Expand

A Hessian Based Complexity Measure for Deep Networks

- Computer Science, Mathematics
- ArXiv
- 2019

A new measure for the complexity of the function generated by a deep network based on the integral of the norm of the tangent Hessian is developed, which shows that the oft-used heuristic of data augmentation imposes an implicit Hessian regularization during learning. Expand

CONTRASTIVE REPRESENTATION LEARNING

- 2021

We propose methods to strengthen the invariance properties of representations obtained by contrastive learning. While existing approaches implicitly induce a degree of invariance as representations… Expand

ENHANCED CONVOLUTIONAL NEURAL KERNELS

- 2019

Recent research shows that for training with `2 loss, convolutional neural networks (CNNs) whose width (number of channels in convolutional layers) goes to infinity correspond to regression with… Expand

Probabilistic symmetry and invariant neural networks

- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2020

Drawing on tools from probability and statistics, a link between functional and probabilistic symmetry is established, and generative functional representations of joint and conditional probability distributions are obtained that are invariant or equivariant under the action of a compact group. Expand

Enhanced Convolutional Neural Tangent Kernels

- Computer Science, Mathematics
- ArXiv
- 2019

The resulting kernel, CNN-GP with LAP and horizontal flip data augmentation, achieves 89% accuracy, matching the performance of AlexNet, which is the best such result the authors know of for a classifier that is not a trained neural network. Expand

Data augmentation and image understanding

- Computer Science
- ArXiv
- 2020

This dissertation focuses on vision and images, and uses data augmentation as a particularly useful inductive bias, a more effective regularisation method for artificial neural networks, and as the framework to analyse and improve the invariance of vision models to perceptually plausible transformations. Expand

#### References

SHOWING 1-10 OF 81 REFERENCES

Dreaming More Data: Class-dependent Distributions over Diffeomorphisms for Learned Data Augmentation

- Computer Science
- AISTATS
- 2016

This work aligns image pairs within each class under the assumption that the spatial transformation between images belongs to a large class of diffeomorphisms, and learns a class-specific probabilistic generative models of the transformations in a Riemannian submanifold of the Lie group of diffEomorphisms. Expand

Deep Symmetry Networks

- Computer Science, Mathematics
- NIPS
- 2014

Deep symmetry networks (symnets), a generalization of convnets that forms feature maps over arbitrary symmetry groups that uses kernel-based interpolation to tractably tie parameters and pool over symmetry spaces of any dimension are introduced. Expand

A Kernel Theory of Modern Data Augmentation

- Computer Science, Mathematics
- ICML
- 2019

This paper provides a general model of augmentation as a Markov process, and shows that kernels appear naturally with respect to this model, even when the authors do not employ kernel classification, and analyzes more directly the effect of Augmentation on kernel classifiers. Expand

Unsupervised Data Augmentation

- Computer Science
- ArXiv
- 2019

UDA has a small twist in that it makes use of harder and more realistic noise generated by state-of-the-art data augmentation methods, which leads to substantial improvements on six language tasks and three vision tasks even when the labeled set is extremely small. Expand

Harmonic Networks: Deep Translation and Rotation Equivariance

- Computer Science, Mathematics
- 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017

H-Nets are presented, a CNN exhibiting equivariance to patch-wise translation and 360-rotation, and it is demonstrated that their layers are general enough to be used in conjunction with the latest architectures and techniques, such as deep supervision and batch normalization. Expand

A Bayesian Data Augmentation Approach for Learning Deep Models

- Computer Science
- NIPS
- 2017

A novel Bayesian formulation to data augmentation is provided, where new annotated training points are treated as missing variables and generated based on the distribution learned from the training set, and this approach produces better classification results than similar GAN models. Expand

Exploiting Cyclic Symmetry in Convolutional Neural Networks

- Computer Science
- ICML
- 2016

This work introduces four operations which can be inserted into neural network models as layers, andWhich can be combined to make these models partially equivariant to rotations, and which enable parameter sharing across different orientations. Expand

Dataset Augmentation in Feature Space

- Computer Science, Mathematics
- ICLR
- 2017

This paper adopts a simpler, domain-agnostic approach to dataset augmentation, and works in the space of context vectors generated by sequence-to-sequence models, demonstrating a technique that is effective for both static and sequential data. Expand

Improved Regularization of Convolutional Neural Networks with Cutout

- Computer Science
- ArXiv
- 2017

This paper shows that the simple regularization technique of randomly masking out square regions of input during training, which is called cutout, can be used to improve the robustness and overall performance of convolutional neural networks. Expand

A Hessian Based Complexity Measure for Deep Networks

- Computer Science, Mathematics
- ArXiv
- 2019

A new measure for the complexity of the function generated by a deep network based on the integral of the norm of the tangent Hessian is developed, which shows that the oft-used heuristic of data augmentation imposes an implicit Hessian regularization during learning. Expand