Attentive Normalization

  title={Attentive Normalization},
  author={Xilai Li and Wei Sun and Tianfu Wu},
Batch Normalization (BN) is a vital pillar in the development of deep learning with many recent variations such as Group Normalization (GN) and Switchable Normalization. [...] Key Method SE explicitly learns how to adaptively recalibrate channel-wise feature responses. They have been studied separately, however. In this paper, we propose a novel and lightweight integration of feature normalization and feature channel-wise attention. We present Attentive Normalization (AN) as a simple and unified alternative. AN…Expand
Exemplar Normalization for Learning Deep Representation
This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN), which is able to learn different normalization methods for different convolutional layers and image samples of a deep network.
Revisiting Batch Normalization
This work revisits the BN formulation and present a new initialization method and update approach for BN to address the aforementioned issues and presents a new online BN-based input data normalization technique to alleviate the need for other offline or fixed methods.
SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks
This paper base on some well-known neuroscience theories and proposes to optimize an energy function to find the importance of each neuron to derive a fast closed-form solution for the energy function, and shows that the solution can be implemented in less than ten lines of code.
Representative Batch Normalization with Feature Calibration
This work proposes to add a simple yet effective feature calibration scheme into the centering and scaling operations of BatchNorm, enhancing the instance-specific representations with the negligible computational cost.
Normalization Techniques in Training DNNs: Methodology, Analysis and Application
A unified picture of the main motivation behind different approaches from the perspective of optimization is provided, and a taxonomy for understanding the similarities and differences between them is presented.
Sandwich Batch Normalization
This work demonstrates the prevailing effectiveness of SaBN as a drop-in replacement in four tasks: neural architecture search, conditional image generation, adversarial training, and arbitrary style transfer, and provides visualizations and analysis to help understand why SaBN works.
Separable Batch Normalization for Robust Facial Landmark Localization with Cross-protocol Network Training
A novel Separable Batch Normalization module with a Cross-protocol Network Training (CNT) strategy for robust facial landmark localization and a novel attention mechanism that assigns different weights to each branch for automatic selection in an effective style are presented.
Learning Local-Global Contextual Adaptation for Fully End-to-End Bottom-Up Human Pose Estimation
The proposed LOGO-CAP is end-to-end trainable with near real-time inference speed, obtaining state-of-the-art performance on the COCO keypoint benchmark for bottom-up human pose estimation.
STC speaker recognition systems for the NIST SRE 2021
This paper presents a description of STC Ltd. systems submitted to the NIST 2021 Speaker Recognition Evaluation for both fixed and open training conditions. These systems consists of a number of
Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection
The proposed MonoCon method is motivated by the Cramèr–Wold theorem in measure theory at a high level and outperforms all prior arts in the leaderboard on the car category and obtains comparable performance on pedestrian and cyclist in terms of accuracy.


Fixup Initialization: Residual Learning Without Normalization
This work proposes fixed-update initialization (Fixup), an initialization motivated by solving the exploding and vanishing gradient problem at the beginning of training via properly rescaling a standard initialization that enables residual networks without normalization to achieve state-of-the-art performance in image classification and machine translation.
Differentiable Learning-to-Normalize via Switchable Normalization
Switchable Normalization (SN), which learns to select different normalizers for different normalization layers of a deep neural network, is proposed, which will help ease the usage and understand the normalization techniques in deep learning.
SSN: Learning Sparse Switchable Normalization via SparsestMax
Sparse Switchable Normalization (SSN) is presented, where the importance ratios are constrained to be sparse, and this constrained optimization problem is turned into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax.
Training Faster by Separating Modes of Variation in Batch-Normalized Models
  • M. Kalayeh, M. Shah
  • Computer Science, Mathematics
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2020
This work studies BN from the viewpoint of Fisher kernels that arise from generative probability models, and proposes a mixture of Gaussian densities for batch normalization, which reduces required number of gradient updates to reach the maximum test accuracy of the batch-normalized model.
Mode Normalization
By extending the normalization to more than a single mean and variance, this work detects modes of data on-the-fly, jointly normalizing samples that share common features.
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
A reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction is presented, improving the conditioning of the optimization problem and speeding up convergence of stochastic gradient descent.
Residual Attention Network for Image Classification
The proposed Residual Attention Network is a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion and can be easily scaled up to hundreds of layers.
Squeeze-and-Excitation Networks
This work proposes a novel architectural unit, which is term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and shows that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets.
Layer Normalization
Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called
Attention is All you Need
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.