Shake-Shake regularization of 3-branch residual networks
@inproceedings{Gastaldi2017ShakeShakeRO, title={Shake-Shake regularization of 3-branch residual networks}, author={Xavier Gastaldi}, booktitle={International Conference on Learning Representations}, year={2017} }
The method introduced in this paper aims at helping computer vision practitioners faced with an overfit problem. The idea is to replace, in a 3-branch ResNet, the standard summation of residual branches by a stochastic affine combination. The largest tested model improves on the best single shot published result on CIFAR10 by reaching 2.86% test error. Code is available at https://github.com/ xgastaldi/shake-shake
58 Citations
HybridNet: Classification and Reconstruction Cooperation for Semi-Supervised Learning
- Computer ScienceECCV
- 2018
A new model for leveraging unlabeled data to improve generalization performances of image classifiers: a two-branch encoder-decoder architecture called HybridNet, able to outperform state-of-the-art results on CIFAR-10, SVHN and STL-10 in various semi-supervised settings.
An improved multi-branch residual network based on random multiplier and adaptive cosine learning rate method
- Computer ScienceJ. Vis. Commun. Image Represent.
- 2019
SMASH: One-Shot Model Architecture Search through HyperNetworks
- Computer ScienceICLR
- 2018
A technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model's architecture is proposed, achieving competitive performance with similarly-sized hand-designed networks.
An overview of mixing augmentation methods and augmentation strategies
- Computer ScienceArtificial Intelligence Review
- 2022
This review mainly covers the methods published in the materials of top-tier conferences and in leading journals in the years 2017–2021, and focuses on two DA research streams: image mixing and automated selection of augmentation strategies.
A Two-Stage Shake-Shake Network for Long-Tailed Recognition of SAR Aerial View Objects
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
- 2022
A two-stage shake-shake network is proposed to tackle the long-tailed learning problem and decouples the learning procedure into the representation learning stage and the classification learning stage to improve the accuracy.
InAugment: Improving Classifiers via Internal Augmentation
- Computer Science2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
- 2021
A novel augmentation operation, InAugment, that exploits image internal statistics that improves the model’s accuracy and confidence but its performance on out-of-distribution images is suggested.
Faster AutoAugment: Learning Augmentation Strategies using Backpropagation
- Computer ScienceECCV
- 2020
This paper proposes a differentiable policy search pipeline for data augmentation, which achieves significantly faster searching than prior work without a performance drop and introduces approximate gradients for several transformation operations with discrete parameters.
Trainable Weight Averaging for Fast Convergence and Better Generalization
- Computer ScienceArXiv
- 2022
Trainable Weight Averaging (TWA) is proposed, essentially a novel training method in a reduced subspace spanned by historical solutions that largely reduces the estimation error from SWA, making it not only further improve the SWA solutions but also take full advantage of the solutions generated in the head of training where SWA fails.
Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning
- Computer ScienceICML
- 2022
This paper proposes an effective method to improve the model generalization by additionally penalizing the gradient norm of loss function during optimization, and shows that the recent sharpness-aware minimization method is a special, but not the best, case of this method.
FreezeOut: Accelerate Training by Progressively Freezing Layers
- Computer ScienceNIPS 2017
- 2017
This extended abstract proposes to only train the hidden layers for a set portion of the training run, freezing them out one-by-one and excluding them from the backward pass, demonstrating savings of up to 20% wall-clock time during training.
References
SHOWING 1-10 OF 18 REFERENCES
Shakeout: A New Regularized Deep Neural Network Training Scheme
- Computer ScienceAAAI
- 2016
This paper presents a new training scheme: Shakeout, which leads to a combination of L1 regularization and L2 regularization imposed on the weights, which has been proved effective by the Elastic Net models in practice.
Aggregated Residual Transformations for Deep Neural Networks
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
On the ImageNet-1K dataset, it is empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy and is more effective than going deeper or wider when the authors increase the capacity.
Identity Mappings in Deep Residual Networks
- Computer ScienceECCV
- 2016
The propagation formulations behind the residual building blocks suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.
Deep Networks with Stochastic Depth
- Computer ScienceECCV
- 2016
Stochastic depth is proposed, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time and reduces training time substantially and improves the test error significantly on almost all data sets that were used for evaluation.
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
- Computer ScienceAAAI
- 2017
Clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly is given and several new streamlined architectures for both residual and non-residual Inception Networks are presented.
Wide Residual Networks
- Computer ScienceBMVC
- 2016
This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts.
Deep Residual Learning for Image Recognition
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Adding Gradient Noise Improves Learning for Very Deep Networks
- Computer ScienceArXiv
- 2015
This paper explores the low-overhead and easy-to-implement optimization technique of adding annealed Gaussian noise to the gradient, which it is found surprisingly effective when training these very deep architectures.
Learning Multiple Layers of Features from Tiny Images
- Computer Science
- 2009
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Computer ScienceICML
- 2015
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.