Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch Noise

  title={Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch Noise},
  author={Senwei Liang and Zhongzhan Huang and Mingfu Liang and Haizhao Yang},
Batch Normalization (BN)(Ioffe and Szegedy 2015) normalizes the features of an input image via statistics of a batch of images and hence BN will bring the noise to the gradient of the training loss. Previous works indicate that the noise is important for the optimization and generalization of deep neural networks, but too much noise will harm the performance of networks. In our paper, we offer a new point of view that self-attention mechanism can help to regulate the noise by enhancing instance… 
Revisiting Batch Normalization
This work revisits the BN formulation and present a new initialization method and update approach for BN to address the aforementioned issues and presents a new online BN-based input data normalization technique to alleviate the need for other offline or fixed methods.
Representative Batch Normalization with Feature Calibration
This work proposes to add a simple yet effective feature calibration scheme into the centering and scaling operations of BatchNorm, enhancing the instance-specific representations with the negligible computational cost.
CrossNorm and SelfNorm for Generalization under Distribution Shifts
Cross Norm exchanges channel-wise mean and variance between feature maps to enlarge training distribution, while SelfNorm uses attention to recalibrate the statistics to bridge gaps between training and test distributions.
AlterSGD: Finding Flat Minima for Continual Learning by Alternative Training
A simple yet effective optimization method, called AlterSGD, to search for a flat minima in the loss landscape, which can significantly mitigate the forgetting and outperform the state-of-the-art methods with a large margin under challenging continual learning protocols.
Blending Pruning Criteria for Convolutional Neural Networks
A novel framework to integrate the existing filter pruning criteria by exploring the criteria diversity is proposed, which outperforms the state-of-the-art baselines, regrading to the compact model performance after pruning.
Normalization Techniques in Training DNNs: Methodology, Analysis and Application
A unified picture of the main motivation behind different approaches from the perspective of optimization is provided, and a taxonomy for understanding the similarities and differences between them is presented.
Convolution-Weight-Distribution Assumption: Rethinking the Criteria of Channel Pruning
This study finds strong similarities among some primary pruning criteria proposed in recent years among convolutional filters, and finds that if the network has too much redundancy, then these criteria can not distinguish the "importance" of the filters.
Rethinking the Pruning Criteria for Convolutional Neural Network
Channel pruning is a popular technique for compressing convolutional neural networks (CNNs), where various pruning criteria have been proposed to remove the redundant filters. From our comprehensive
Efficient Attention Network: Accelerate Attention by Searching Where to Plug
This work proposes a framework called Efficient Attention Network (EAN), which leverage the sharing mechanism to share the attention module within the backbone and search where to connect the shared attention module via reinforcement learning to improve the efficiency for the existing attention modules.


Adaptive Batch Normalization for practical domain adaptation
This paper proposes a simple yet powerful remedy, called Adaptive Batch Normalization (AdaBN) to increase the generalization ability of a DNN, and demonstrates that the method is complementary with other existing methods and may further improve model performance.
Group Normalization
Group Normalization can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks.
Understanding Batch Normalization
It is shown that BN primarily enables training with larger learning rates, which is the cause for faster convergence and better generalization, and contrasts the results against recent findings in random matrix theory, shedding new light on classical initialization schemes and their consequences.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
How Does Batch Normalization Help Optimization?
It is demonstrated that such distributional stability of layer inputs has little to do with the success of BatchNorm, and this smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training.
Aggregated Residual Transformations for Deep Neural Networks
On the ImageNet-1K dataset, it is empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy and is more effective than going deeper or wider when the authors increase the capacity.
Towards Understanding Regularization in Batch Normalization
Batch Normalization improves both convergence and generalization in training neural networks and is analyzed by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function.
Differentiable Learning-to-Normalize via Switchable Normalization
Switchable Normalization (SN), which learns to select different normalizers for different normalization layers of a deep neural network, is proposed, which will help ease the usage and understand the normalization techniques in deep learning.
Squeeze-and-Excitation Networks
  • Jie Hu, Li Shen, Gang Sun
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
A reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction is presented, improving the conditioning of the optimization problem and speeding up convergence of stochastic gradient descent.