• Corpus ID: 238531789

Unveiling the Power of Mixup for Stronger Classifiers

@inproceedings{Liu2021UnveilingTP,
  title={Unveiling the Power of Mixup for Stronger Classifiers},
  author={Zicheng Liu and Siyuan Li and Di Wu and Zhiyuan Chen and Lirong Wu and Jianzhu Guo and Stan Z. Li},
  year={2021}
}
Mixup-based data augmentations have achieved great success as regularizers for deep neural networks. However, existing methods rely on deliberately handcrafted mixup policies, which ignore or oversell the semantic matching between mixed samples and labels. Driven by their prior assumptions, early methods attempt to smooth decision boundaries by random linear interpolation while others focus on maximizing classrelated information via offline saliency optimization. As a result, the issue of label… 
Multi-Sample ζ-mixup: Richer, More Realistic Synthetic Samples from a p-Series Interpolant
TLDR
This work proposes ζ - mixup, a generalization of mixup with provably and demonstrably desirable properties that allows convex combinations of N ≥ 2 samples, leading to more realistic and diverse outputs that incorporate information from N original samples by using a p -series interpolant.

References

SHOWING 1-10 OF 66 REFERENCES
A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets
TLDR
This work proposes a downsampled version of ImageNet that contains exactly the same number of classes and images as ImageNet, with the only difference that the images aredownsampled to 32$\times$32 pixels per image.
The Caltech-UCSD Birds-200-2011 Dataset
CUB-200-2011 is an extended version of CUB-200 [7], a challenging dataset of 200 bird species. The extended version roughly doubles the number of images per category and adds new part localization
CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features
TLDR
Patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches, and CutMix consistently outperforms state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on ImageNet weakly-supervised localization task.
mixup: Beyond Empirical Risk Minimization
TLDR
This work proposes mixup, a simple learning principle that trains a neural network on convex combinations of pairs of examples and their labels, which improves the generalization of state-of-the-art neural network architectures.
Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity
TLDR
A new perspective on batch mixup is proposed and the optimal construction of a batch of mixup data is formulated maximizing the data saliency measure of each individual mixupData and encouraging the supermodular diversity among the constructed mix up data.
Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup
TLDR
The experiments show Puzzle Mix achieves the state of the art generalization and the adversarial robustness results compared to other mixup methods on CIFAR-100, Tiny-ImageNet, and ImageNet datasets.
Evaluating Weakly Supervised Object Localization Methods Right
TLDR
It is argued that WSOL task is ill-posed with only image-level labels, and a new evaluation protocol is proposed where full supervision is limited to only a small held-out set not overlapping with the test set.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
TLDR
This work proposes a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable, and shows that even non-attention based models learn to localize discriminative regions of input image.
A ConvNet for the 2020s
TLDR
This work gradually “modernize” a standard ResNet toward the design of a vision Transformer, and discovers several key components that contribute to the performance difference along the way, leading to a family of pure ConvNet models dubbed ConvNeXt.
TransMix: Attend to Mix for Vision Transformers
TLDR
TransMix is proposed, which mixes labels based on the attention maps of Vision Transformers, which can consistently improve various ViT-based models at scales on ImageNet classification.
...
...