• Corpus ID: 235212148

Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error

  title={Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error},
  author={Stanislav Fort and Andrew Brock and Razvan Pascanu and Soham De and Samuel L. Smith},
In computer vision, it is standard practice to draw a single sample from the data augmentation procedure for each unique image in the mini-batch. However recent work has suggested drawing multiple samples can achieve higher test accuracies. In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences model performance on held out data when training deep ResNets. We demonstrate drawing multiple samples per image consistently… 

Regularising for invariance to data augmentation improves supervised learning

It is shown that the predictions of the best performing method are also the most similar when compared on different augmentations of the same input, and an explicit regulariser is proposed that improves generalisation and equalises performance differences between all considered objectives.

Deep AutoAugment

This work proposes a fully automated approach for data augmentation search named Deep AutoAugment (DeepAA), which progressively builds a multi-layerData augmentation pipeline from scratch by stacking augmentation layers one at a time until reaching convergence.

SmoothNets: Optimizing CNN architecture design for differentially private deep learning

By combining components which exhibit good individual performance, this work distilled a new model architecture termed SmoothNet, which is characterised by increased robustness to the challenges of DP-SGD training, and outperforms standard architectures on two benchmark datasets.

Unlocking High-Accuracy Differentially Private Image Classification through Scale

It is demonstrated that DP-SGD on over-parameterized models can perform significantly better than previously thought and is believed to be a step towards closing the accuracy gap between private and non-private image classi-cation benchmarks.


This work proposes a fully automated approach for data augmentation search named Deep AutoAugment (DeepAA), which progressively builds a multi-layerData augmentation pipeline from scratch by stacking augmentation layers one at a time until reaching convergence.

VOLO: Vision Outlooker for Visual Recognition

A new simple and generic architecture, termed Vision Outlooker (VOLO), which implements a novel outlook attention operation that dynamically conduct the local feature aggregation mechanism in a sliding window manner across the input image, which can more efficiently encode fine-level features that are essential for high-performance visual recognition.

How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

Despite the clear performance benefits of data augmentations, little is known about why they are so effective. In this paper, we disentangle several key mechanisms through which data augmentations

Text Data Augmentation for Deep Learning

The major motifs of Data Augmentation are summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form.

Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities

First, the algorithmic speedup problem is formalized, then fundamental building blocks of algorithmically efficient training are used to develop a taxonomy, which highlights commonalities of seemingly disparate methods and reveals current research gaps.

Using Soft Labels to Model Uncertainty in Medical Image Segmentation

This work proposes a simple method to obtain soft labels from the annotations of multiple physicians and train models that, for each image, produce a single well-calibrated output that can be thresholded at multiple confidence levels, according to each application’s precision-recall requirements.



High-Performance Large-Scale Image Recognition Without Normalization

An adaptive gradient clipping technique is developed which overcomes instabilities in batch normalization, and a significantly improved class of Normalizer-Free ResNets is designed which attain significantly better performance when finetuning on ImageNet.

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

This paper empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization and enable training visual recognition models on internet-scale data with high efficiency.

A survey on Image Data Augmentation for Deep Learning

This survey will present existing methods for Data Augmentation, promising developments, and meta-level decisions for implementing DataAugmentation, a data-space solution to the problem of limited data.

Randaugment: Practical automated data augmentation with a reduced search space

This work proposes a simplified search space that vastly reduces the computational expense of automated augmentation, and permits the removal of a separate proxy task.

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

This work investigates the cause for this generalization drop in the large-batch regime and presents numerical evidence that supports the view that large- batch methods tend to converge to sharp minimizers of the training and testing functions - and as is well known, sharp minima lead to poorer generalization.

CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features

Patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches, and CutMix consistently outperforms state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on ImageNet weakly-supervised localization task.

Characterizing signal propagation to close the performance gap in unnormalized ResNets

A simple set of analysis tools to characterize signal propagation on the forward pass is proposed, and this technique preserves the signal in networks with ReLU or Swish activation functions by ensuring that the per-channel activation means do not grow with depth.

Rethinking the Inception Architecture for Computer Vision

This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks

This work develops a simple initialization scheme that can train deep residual networks without normalization, and provides a detailed empirical study of residual networks, which clarifies that, although batch normalized networks can be trained with larger learning rates, this effect is only beneficial in specific compute regimes, and has minimal benefits when the batch size is small.

Training data-efficient image transformers & distillation through attention

This work produces a competitive convolution-free transformer by training on Imagenet only, and introduces a teacher-student strategy specific to transformers that relies on a distillation token ensuring that the student learns from the teacher through attention.