Attentive Cutmix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification

@article{Walawalkar2020AttentiveCA,
  title={Attentive Cutmix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification},
  author={Devesh Walawalkar and Zhiqiang Shen and Zechun Liu and Marios Savvides},
  journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2020},
  pages={3642-3646}
}
Convolutional neural networks (CNN) are capable of learning robust representation with different regularization methods and activations as convolutional layers are spatially correlated. Based on this property, a large variety of regional dropout strategies have been proposed, such as Cutout [1], DropBlock [2], CutMix [3], etc. These methods aim to promote the network to generalize better by partially occluding the discriminative parts of objects. However, all of them perform this operation… 

Figures and Tables from this paper

Where to Cut and Paste: Data Regularization with Selective Features
TLDR
A new data augmentation method strategy, called FocusMix, which exploits informative pixels based on proper sampling techniques is proposed, and it is shown that FocusMix results in improvements in performance compared to otherData augmentation methods.
Grad-Cam Guided Progressive Feature CutMix for Classification
TLDR
The attentive feature cutmix is performed in a progressive manner, among the multi-branch classifier trained on the same task, preventing the model from relying on only the small regions and forcing it to gradually focus on large areas.
KeepAugment: A Simple Information-Preserving Data Augmentation Approach
TLDR
This paper empirically shows that the standard data augmentation methods may introduce distribution shift and consequently hurt the performance on unaugmented data during inference, and proposes a simple yet effective approach, dubbed KeepAugment, to increase the fidelity of augmented images.
An overview of mixing augmentation methods and augmentation strategies
TLDR
This survey focuses on two DA research streams: image mixing and automated selection of augmentation strategies and mainly covers the methods published in the materials of top-tier conferences and in leading journals in the years 2017-2021.
Survey: Image Mixing and Deleting for Data Augmentation
TLDR
This paper empirically evaluates these approaches for image classification, fine-grained image recognition, and object detection where it is shown that this category of data augmentation improves the overall performance for deep neural networks.
ScoreNet: Learning Non-Uniform Attention and Augmentation for Transformer-Based Histopathological Image Classification
TLDR
A transformer-based architecture specifically tailored for histopathological image classification is proposed, which combines fine-grained local attention with a coarse global attention mechanism to learn meaningful representations of high-resolution images at an efficient computational cost.
Intra-class Part Swapping for Fine-Grained Image Classification
TLDR
Intra-class Part Swapping (InPS) is proposed that produces new data by performing attention-guided content swapping on input pairs from the same class and outperforms the most recent augmentation approaches in both fine-grained recognition and weakly object localization.
Evolving Image Compositions for Feature Representation Learning
TLDR
This paper proposes PatchMix, a data augmentation method that creates new samples by composing patches from pairs of images in a grid-like pattern that outperforms a base model on CIFAR-10, CIFar-100, Tiny Imagenet, and ImageNet and explores evolutionary search as a guiding strategy to jointly discover optimal grid- like patterns and image pairings.
RecursiveMix: Mixed Learning with History
TLDR
A recursive mixed-sample learning paradigm, termed “RecursiveMix” (RM), is proposed, by exploring a novel training strategy that leverages the historical input-prediction-label triplets and introduces a consistency loss to align the identical image semantics across the iterations, which helps the learning of scale-invariant feature representations.
Learning Temporally Invariant and Localizable Features via Data Augmentation for Video Recognition
TLDR
This work extends data augmentation strategies to the temporal dimension for videos to learn temporally invariant or temporally localizable features to cover temporal perturbations or complex actions in videos that cannot be achieved using spatial augmentation algorithms.
...
1
2
3
4
...

References

SHOWING 1-10 OF 20 REFERENCES
CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features
TLDR
Patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches, and CutMix consistently outperforms state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on ImageNet weakly-supervised localization task.
Improved Regularization of Convolutional Neural Networks with Cutout
TLDR
This paper shows that the simple regularization technique of randomly masking out square regions of input during training, which is called cutout, can be used to improve the robustness and overall performance of convolutional neural networks.
Squeeze-and-Excitation Networks
  • Jie Hu, Li Shen, Gang Sun
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
TLDR
This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.
Improving Object Detection from Scratch via Gated Feature Reuse
TLDR
A novel gate-controlled prediction strategy enabled by Squeeze-and-Excitation to adaptively enhance or attenuate supervision at different scales based on the input object size is introduced, which is more effective in detecting diverse sizes of objects.
DSOD: Learning Deeply Supervised Object Detectors from Scratch
TLDR
Deeply Supervised Object Detector (DSOD), a framework that can learn object detectors from scratch following the single-shot detection (SSD) framework, and one of the key findings is that deep supervision, enabled by dense layer-wise connections, plays a critical role in learning a good detector.
Residual Attention Network for Image Classification
TLDR
The proposed Residual Attention Network is a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion and can be easily scaled up to hundreds of layers.
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Densely Connected Convolutional Networks
TLDR
The Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion, and has several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
DropBlock: A regularization method for convolutional networks
TLDR
DropBlock is introduced, a form of structured dropout, where units in a contiguous region of a feature map are dropped together, and it is found that applying DropbBlock in skip connections in addition to the convolution layers increases the accuracy.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
...
1
2
...