• Corpus ID: 212628682

TaskNorm: Rethinking Batch Normalization for Meta-Learning

@article{Bronskill2020TaskNormRB,
  title={TaskNorm: Rethinking Batch Normalization for Meta-Learning},
  author={John Bronskill and Jonathan Gordon and James Requeima and Sebastian Nowozin and Richard E. Turner},
  journal={ArXiv},
  year={2020},
  volume={abs/2003.03284}
}
Modern meta-learning approaches for image classification rely on increasingly deep networks to achieve state-of-the-art performance, making batch normalization an essential component of meta-learning pipelines. However, the hierarchical nature of the meta-learning setting presents several challenges that can render conventional batch normalization ineffective, giving rise to the need to rethink normalization in this setting. We evaluate a range of approaches to batch normalization for meta… 
MetaNorm: Learning to Normalize Few-Shot Batches Across Domains
TLDR
MetaNorm is generic, flexible and model-agnostic, making it a simple plug-and-play module that is seamlessly embedded into existing meta-learning approaches that can be efficiently implemented by lightweight hypernetworks with low computational cost.
Few-shot Open-set Recognition by Transformation Consistency
TLDR
This paper proposes a novel unknown class sample detector, named SnaTCHer, that does not require pseudo-unseen samples and alters the unseen class distribution estimation problem to a relative feature transformation problem, independent of pseudo- unseen class samples.
Semantics-driven Attentive Few-shot Learning over Clean and Noisy Samples
TLDR
This work proposes semantically-conditioned feature attention and sample attention mechanisms that estimate the importance of representation dimensions and training instances and demonstrates the effectiveness of the proposed semantic FSL model with and without sample noise.
Bridging Few-Shot Learning and Adaptation: New Challenges of Support-Query Shift
TLDR
This work addresses the new and challenging problem of Few-Shot Learning under Support/Query Shift (FSQS) i.e., when support and query instances are sampled from related but different distributions, and studies both the role of Batch-Normalization and Optimal Transport in aligning distributions.
Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain, Active and Continual Few-Shot Learning
TLDR
This work proposes a variance-sensitive class of models that operate in a low-label regime and employs a hierarchically regularized Mahalanobis-distance based classifier combined with a state of the art neural adaptive feature extractor to achieve strong performance on Meta-Dataset, mini-Image net and tiered-ImageNet benchmarks.
C ONTINUAL N ORMALIZATION : R ETHINKING B ATCH N ORMALIZATION FOR O NLINE C ONTINUAL L EARNING
TLDR
This work studies the cross-task normalization effect of BN in online continual learning where BN normalizes the testing data using moments biased towards the current task, resulting in higher catastrophic forgetting.
CSHE: network pruning by using cluster similarity and matrix eigenvalues
TLDR
A novel filter pruning method that combines convolution filters and feature maps information for convolutional neural network compression, namely network pruning by using cluster similarity and large eigenvalues (CSHE).
Continual Normalization: Rethinking Batch Normalization for Online Continual Learning
TLDR
This work studies the cross-task normalization effect of BN in online continual learning where BN normalizes the testing data using moments biased towards the current task, resulting in higher catastrophic forgetting.
Delving into the Estimation Shift of Batch Normalization in a Network
TLDR
This paper defines the estimation shift magnitude of BN to quantitatively measure the difference between its estimated population statistics and expected ones and designs a batch-free normalization (BFN) that can block such an accumulation of estimation shift.
Diagnosing Batch Normalization in Class Incremental Learning
TLDR
This paper investigates the influence of BN on Class-IL models by illustrating such BN dilemma and proposes BN Tricks to address the issue by training a better feature extractor while eliminating classification bias.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 45 REFERENCES
Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
TLDR
This work proposes Meta-Dataset: a new benchmark for training and evaluating models that is large-scale, consists of diverse datasets, and presents more realistic tasks, and proposes a new set of baselines for quantifying the benefit of meta-learning in Meta- Dataset.
On First-Order Meta-Learning Algorithms
TLDR
A family of algorithms for learning a parameter initialization that can be fine-tuned quickly on a new task, using only first-order derivatives for the meta-learning updates, including Reptile, which works by repeatedly sampling a task, training on it, and moving the initialization towards the trained weights on that task.
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning
Instance Normalization: The Missing Ingredient for Fast Stylization
TLDR
A small change in the stylization architecture results in a significant qualitative improvement in the generated images, and can be used to train high-performance architectures for real-time image generation.
One shot learning of simple visual concepts
TLDR
A generative model of how characters are composed from strokes is introduced, where knowledge from previous characters helps to infer the latent strokes in novel characters, using a massive new dataset of handwritten characters.
Learning (to learn) from few examples
TLDR
This thesis proposes techniques that sidestep per-task data scarcity by leveraging a large number of small episodes, each characterised by a limited training set, for both tracking and classification.
Recasting Gradient-Based Meta-Learning as Hierarchical Bayes
TLDR
This work reformulates the model-agnostic meta-learning algorithm (MAML) of Finn et al. (2017) as a method for probabilistic inference in a hierarchical Bayesian model and proposes an improvement to the MAML algorithm that makes use of techniques from approximate inference and curvature estimation.
Prototypical Networks for Few-shot Learning
TLDR
This work proposes Prototypical Networks for few-shot classification, and provides an analysis showing that some simple design decisions can yield substantial improvements over recent approaches involving complicated architectural choices and meta-learning.
Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes
The goal of this paper is to design image classification systems that, after an initial multi-task training phase, can automatically adapt to new tasks encountered at test time. We introduce a
Meta-Learning Probabilistic Inference for Prediction
TLDR
VERSA is introduced, an instance of the framework employing a flexible and versatile amortization network that takes few-shot learning datasets as inputs, with arbitrary numbers of shots, and outputs a distribution over task-specific parameters in a single forward pass, amortizing the cost of inference and relieving the need for second derivatives during training.
...
1
2
3
4
5
...