Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch Noise

  title={Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch Noise},
  author={Senwei Liang and Zhongzhan Huang and Mingfu Liang and Haizhao Yang},
Batch Normalization (BN)(Ioffe and Szegedy 2015) normalizes the features of an input image via statistics of a batch of images and hence BN will bring the noise to the gradient of the training loss. Previous works indicate that the noise is important for the optimization and generalization of deep neural networks, but too much noise will harm the performance of networks. In our paper, we offer a new point of view that self-attention mechanism can help to regulate the noise by enhancing instance… Expand
Representative Batch Normalization With Feature Calibration
This work proposes to add a simple yet effective feature calibration scheme into the centering and scaling operations of BatchNorm, enhancing the instance-specific representations with the negligible computational cost. Expand
CrossNorm and SelfNorm for Generalization under Distribution Shifts
Cross Norm exchanges channel-wise mean and variance between feature maps to enlarge training distribution, while SelfNorm uses attention to recalibrate the statistics to bridge gaps between training and test distributions. Expand
AlterSGD: Finding Flat Minima for Continual Learning by Alternative Training
A simple yet effective optimization method, called AlterSGD, to search for a flat minima in the loss landscape, which can significantly mitigate the forgetting and outperform the state-of-the-art methods with a large margin under challenging continual learning protocols. Expand
Blending Pruning Criteria for Convolutional Neural Networks
A novel framework to integrate the existing filter pruning criteria by exploring the criteria diversity is proposed, which outperforms the state-of-the-art baselines, regrading to the compact model performance after pruning. Expand
Normalization Techniques in Training DNNs: Methodology, Analysis and Application
A unified picture of the main motivation behind different approaches from the perspective of optimization is provided, and a taxonomy for understanding the similarities and differences between them is presented. Expand
Convolution-Weight-Distribution Assumption: Rethinking the Criteria of Channel Pruning
This study finds strong similarities among some primary pruning criteria proposed in recent years among convolutional filters, and finds that if the network has too much redundancy, then these criteria can not distinguish the "importance" of the filters. Expand
Efficient Attention Network: Accelerate Attention by Searching Where to Plug
This work proposes a framework called Efficient Attention Network (EAN), which leverage the sharing mechanism to share the attention module within the backbone and search where to connect the shared attention module via reinforcement learning to improve the efficiency for the existing attention modules. Expand


Adaptive Batch Normalization for practical domain adaptation
This paper proposes a simple yet powerful remedy, called Adaptive Batch Normalization (AdaBN) to increase the generalization ability of a DNN, and demonstrates that the method is complementary with other existing methods and may further improve model performance. Expand
Group Normalization
Group Normalization can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. Expand
Understanding Batch Normalization
It is shown that BN primarily enables training with larger learning rates, which is the cause for faster convergence and better generalization, and contrasts the results against recent findings in random matrix theory, shedding new light on classical initialization schemes and their consequences. Expand
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Expand
How Does Batch Normalization Help Optimization?
It is demonstrated that such distributional stability of layer inputs has little to do with the success of BatchNorm, and this smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training. Expand
Aggregated Residual Transformations for Deep Neural Networks
On the ImageNet-1K dataset, it is empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy and is more effective than going deeper or wider when the authors increase the capacity. Expand
Towards Understanding Regularization in Batch Normalization
Batch Normalization improves both convergence and generalization in training neural networks and is analyzed by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function. Expand
Differentiable Learning-to-Normalize via Switchable Normalization
Switchable Normalization (SN), which learns to select different normalizers for different normalization layers of a deep neural network, is proposed, which will help ease the usage and understand the normalization techniques in deep learning. Expand
Squeeze-and-Excitation Networks
  • Jie Hu, Li Shen, Gang Sun
  • Computer Science
  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost. Expand
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
A reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction is presented, improving the conditioning of the optimization problem and speeding up convergence of stochastic gradient descent. Expand