Corpus ID: 231879922

High-Performance Large-Scale Image Recognition Without Normalization

@article{Brock2021HighPerformanceLI,
  title={High-Performance Large-Scale Image Recognition Without Normalization},
  author={Andrew Brock and Soham De and Samuel L. Smith and Karen Simonyan},
  journal={ArXiv},
  year={2021},
  volume={abs/2102.06171}
}
Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size and interactions between examples. Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, and are often unstable for large learning rates or strong data augmentations. In this work, we develop an adaptive gradient… Expand
Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error
TLDR
It is found that drawing multiple samples per image consistently enhances the test accuracy achieved for both small and large batch training, despite reducing the number of unique training examples in each mini-batch. Expand
CoAtNet: Marrying Convolution and Attention for All Data Sizes
TLDR
CoAtNets (pronounced “coat” nets), a family of hybrid models built from two key insights: (1) depthwise Convolution and self-Attention can be naturally unified via simple relative attention and (2) vertically stacking convolution layers and attention layers in a principled way is surprisingly effective in improving generalization, capacity and efficiency. Expand
Token Labeling: Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet
TLDR
By slightly tuning the structure of vision transformers and introducing token labeling—a new training objective, these models are able to achieve better results than the CNN counterparts and other transformer-based classification models with similar amount of training parameters and computations. Expand
Effect of large-scale pre-training on full and few-shot transfer learning for natural and medical images
TLDR
This work conducts largescale pre-training on large source datasets of either natural (ImageNet-21k/1k) or medical chest X-Ray images and compares full and few-shot transfer using different target datasets from both natural and medical imaging domains, indicating possibility to obtain high quality models for domain-specific transfer by pre- training instead on comparably very large, generic source data. Expand
Making EfficientNet More Efficient: Exploring Batch-Independent Normalization, Group Convolutions and Reduced Resolution Training
TLDR
This work focuses on improving the practical efficiency of the state-of-the-art EfficientNet models on a new class of accelerator, the Graphcore IPU, by extending this family of models in the following ways: generalising depthwise convolutions to group convolutions and reducing compute by lowering the training resolution and inexpensively fine-tuning at higher resolution. Expand
ResNet strikes back: An improved training procedure in timm
TLDR
This paper re-evaluate the performance of the vanilla ResNet-50 when trained with a procedure that integrates such advances, and shares competitive training settings and pre-trained models in the timm open-source library, with the hope that they will serve as better baselines for future work. Expand
Parameter Prediction for Unseen Deep Architectures
TLDR
This work proposes a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU, and learns a strong representation of neural architectures enabling their analysis. Expand
Going deeper with Image Transformers
TLDR
This work builds and optimize deeper transformer networks for image classification and investigates the interplay of architecture and optimization of such dedicated transformers, improving the accuracy of deep transformers. Expand
VOLO: Vision Outlooker for Visual Recognition
TLDR
A novel outlook attention is introduced and presented, termed Vision Outlooker (VOLO), which efficiently encodes finer-level features and contexts into tokens, which is shown to be critically beneficial to recognition performance but largely ignored by the self-attention. Expand
Scaling Vision with Sparse Mixture of Experts
TLDR
This work presents a Vision MoE (V-MoE), a sparse version of the Vision Transformer that is scalable and competitive with the largest dense networks, and proposes an extension to the routing algorithm that can prioritize subsets of each input across the entire batch, leading to adaptive per-image compute. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 106 REFERENCES
Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification
TLDR
Although WN achieves better training accuracy, the final test accuracy is significantly lower than that of BN, demonstrating the surprising strength of the BN regularization effect which was unable to compensate for using standard regularization techniques like dropout and weight decay. Expand
Weight Standardization
TLDR
Weight Standardization is proposed to accelerate deep network training by standardizing the weights in the convolutional layers, which is able to smooth the loss landscape by reducing the Lipschitz constants of the loss and the gradients. Expand
Fixing the train-test resolution discrepancy
TLDR
It is experimentally validated that, for a target test resolution, using a lower train resolution offers better classification at test time, and a simple yet effective and efficient strategy to optimize the classifier performance when the train and test resolutions differ is proposed. Expand
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
TLDR
This paper empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization and enable training visual recognition models on internet-scale data with high efficiency. Expand
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Expand
Exploring the Limits of Weakly Supervised Pretraining
TLDR
This paper presents a unique study of transfer learning with large convolutional networks trained to predict hashtags on billions of social media images and shows improvements on several image classification and object detection tasks, and reports the highest ImageNet-1k single-crop, top-1 accuracy to date. Expand
Group Normalization
TLDR
Group Normalization can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. Expand
Characterizing signal propagation to close the performance gap in unnormalized ResNets
TLDR
A simple set of analysis tools to characterize signal propagation on the forward pass is proposed, and this technique preserves the signal in networks with ReLU or Swish activation functions by ensuring that the per-channel activation means do not grow with depth. Expand
Very Deep Convolutional Networks for Large-Scale Image Recognition
TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. Expand
CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features
TLDR
Patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches, and CutMix consistently outperforms state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on ImageNet weakly-supervised localization task. Expand
...
1
2
3
4
5
...