Regularising for invariance to data augmentation improves supervised learning

  title={Regularising for invariance to data augmentation improves supervised learning},
  author={Aleksander Botev and M. Bauer and Soham De},
Data augmentation is used in machine learning to make the classifier invariant to label-preserving transformations. Usually this invariance is only encouraged implicitly by sampling a single augmentation per image and training epoch. However, several works have recently shown that using multiple augmentations per input can improve generalisation or can be used to incorporate invariances more explicitly. In this work, we first empirically compare these recently proposed objectives that differ in… 

A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning

A principled taxonomy of the existing augmentation techniques used in visual RL and an in-depth discussion on how to better leverage augmented data in di erent scenarios are presented.



Data augmentation in Bayesian neural networks and the cold posterior effect

It is suggested that the cold posterior effect cannot be dismissed as an artifact of data augmentation using incorrect likelihoods, and multi-sample bounds tighter than those used previously are derived.

2021) proved that they form lower bounds for any value of K: L〈probs〉 ≥ EpK

  • 2021

Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error

A detailed empirical evaluation of how the number of augmentation samples per unique image influences model performance on held out data when training deep ResNets and applies these insights to the highly performant NF net, achieving 86.8% top-1 w/o extra data on ImageNet.

Incorporating prior information in machine learning by creating virtual examples

It is shown that in some contexts this idea of using prior knowledge by creating virtual examples and thereby expanding the effective training-set size is mathematically equivalent to incorporating the prior knowledge as a regularizer, suggesting that the strategy is well motivated.

High-Performance Large-Scale Image Recognition Without Normalization

An adaptive gradient clipping technique is developed which overcomes instabilities in batch normalization, and a significantly improved class of Normalizer-Free ResNets is designed which attain significantly better performance when finetuning on ImageNet.

Characterizing signal propagation to close the performance gap in unnormalized ResNets

A simple set of analysis tools to characterize signal propagation on the forward pass is proposed, and this technique preserves the signal in networks with ReLU or Swish activation functions by ensuring that the per-channel activation means do not grow with depth.

Representation Learning via Invariant Causal Mechanisms

A novel self-supervised objective, Representation Learning via Invariant Causal Mechanisms (ReLIC), is proposed that enforces invariant prediction of proxy targets across augmentations through an invariance regularizer which yields improved generalization guarantees.

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning

This work introduces Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning that performs on par or better than the current state of the art on both transfer and semi- supervised benchmarks.

Randaugment: Practical automated data augmentation with a reduced search space

This work proposes a simplified search space that vastly reduces the computational expense of automated augmentation, and permits the removal of a separate proxy task.

CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features

Patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches, and CutMix consistently outperforms state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on ImageNet weakly-supervised localization task.