• Corpus ID: 102351532

Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift

  title={Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift},
  author={Michal Zajac and Konrad Zolna and Stanislaw Jastrzebski},
Recent work has shown that using unlabeled data in semi-supervised learning is not always beneficial and can even hurt generalization, especially when there is a class mismatch between the unlabeled and labeled examples. We investigate this phenomenon for image classification on the CIFAR-10 and the ImageNet datasets, and with many other forms of domain shifts applied (e.g. salt-and-pepper noise). Our main contribution is Split Batch Normalization (Split-BN), a technique to improve SSL when the… 

Figures and Tables from this paper

Entropy Repulsion for Semi-supervised Learning Against Class Mismatch

This work proposes a new technique, entropy repulsion for mismatch (ERCM), to improve SSL against a class mismatch situation and demonstrates that ERCM can significantly improve the performance of state-of-the-art SSL algorithms, namely Mean Teacher, Virtual Adversarial Training (VAT) and Mixmatch in various class-mismatch cases.

Does Data Augmentation Benefit from Split BatchNorms

A recently proposed training paradigm is explored using an auxiliary BatchNorm for the potentially out-of-distribution, strongly augmented images, and this method significantly improves the performance of common image classification benchmarks such as CIFar-10, CIFAR-100, and ImageNet.

A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation*

It is demonstrated that the devil is in the details: a set of simple designs and training techniques can collectively improve the performance of semi-supervised semantic segmentation significantly.

Sandwich Batch Normalization

This work demonstrates the prevailing effectiveness of SaBN as a drop-in replacement in four tasks: neural architecture search, conditional image generation, adversarial training, and arbitrary style transfer, and provides visualizations and analysis to help understand why SaBN works.

Sandwich Batch Normalization: A Drop-In Replacement for Feature Distribution Heterogeneity

This work demonstrates the prevailing effectiveness of SaBN as a drop-in replacement in four tasks: conditional image generation, neural architecture search, adversarial training, and arbitrary style transfer, and provides visualizations and analysis to help understand why SaBN works.

Unsupervised Domain Adaptation for Person Re-Identification through Source-Guided Pseudo-Labeling

This guideline introduces a framework which relies on a two-branch architecture optimizing classification and triplet loss based metric learning in source and target domains, respectively, in order to allow adaptability to the target domain while ensuring robustness to noisy pseudo-labels.

Separable Batch Normalization for Robust Facial Landmark Localization with Cross-protocol Network Training

A novel Separable Batch Normalization (SepBN) method is presented, different from the classical BN layer, that learns multiple sets of mapping parameters to adaptively scale and shift the normalized feature maps via a feed-forward attention mechanism.

Rethinking "Batch" in BatchNorm

This paper thoroughly reviews problems in visual recognition tasks, and shows that a key to address them is to rethink different choices in the concept of “batch” in BatchNorm.

AugMax: Adversarial Composition of Random Augmentations for Robust Training

A disentangled normalization module, termed DuBIN (Dual-Batch-and-Instance Normalization), is designed that disentangles the instance-wise feature heterogeneity arising from AugMax, a stronger form of data augmentation that leads to a significantly augmented input distribution which makes model training more challenging.

Frequency Principle in deep learning: an overview

  • Z. Xu
  • Computer Science
  • 2019
The low-frequency implicit bias illustrates the strength of neural network at learning low- frequency function while suffering from learning high-frequency function and further advances the study of deep learning from frequency perspective.



Realistic Evaluation of Deep Semi-Supervised Learning Algorithms

This work creates a unified reimplemention and evaluation platform of various widely-used SSL techniques and finds that the performance of simple baselines which do not use unlabeled data is often underreported, that SSL methods differ in sensitivity to the amount of labeled and unlabeling data, and that performance can degrade substantially when the unlabelED dataset contains out-of-class examples.

Revisiting Batch Normalization For Practical Domain Adaptation

This paper proposes a simple yet powerful remedy, called Adaptive Batch Normalization (AdaBN) to increase the generalization ability of a DNN, and demonstrates that the method is complementary with other existing methods and may further improve model performance.

Temporal Ensembling for Semi-Supervised Learning

Self-ensembling is introduced, where it is shown that this ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training.

Strong Baselines for Neural Semi-Supervised Learning under Domain Shift

This paper re-evaluate classic general-purpose bootstrapping approaches in the context of neural networks under domain shifts vs. recent neural approaches and proposes a novel multi-task tri-training method that reduces the time and space complexity of classic tri- training.

Robust Semi-Supervised Learning when Labels are Missing at Random

A semi-supervised learning approach is developed that relaxes restrictive assumptions about the unlabeled features and is capable of providing classifiers that reliably measure the label uncertainty and is applicable using any generative model with a supervised learning algorithm.

Training Faster by Separating Modes of Variation in Batch-Normalized Models

  • M. KalayehM. Shah
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2020
This work studies BN from the viewpoint of Fisher kernels that arise from generative probability models, and proposes a mixture of Gaussian densities for batch normalization, which reduces required number of gradient updates to reach the maximum test accuracy of the batch-normalized model.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

Learning Robust Representations by Projecting Superficial Statistics Out

This work aims to produce a classifier that will generalize to previously unseen domains, even when domain identifiers are not available during training, and incorporates the gray-level co-occurrence matrix (GLCM) to extract patterns that prior knowledge suggests are superficial.

Mode Normalization

By extending the normalization to more than a single mean and variance, this work detects modes of data on-the-fly, jointly normalizing samples that share common features.

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results

The recently proposed Temporal Ensembling has achieved state-of-the-art results in several semi-supervised learning benchmarks, but it becomes unwieldy when learning large datasets, so Mean Teacher, a method that averages model weights instead of label predictions, is proposed.