Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning

  title={Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning},
  author={Haoxiang Wang and Yite Wang and Ruoyu Sun and Bo Li},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Haoxiang WangYite Wang Bo Li
  • Published 17 March 2022
  • Computer Science
  • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Model-agnostic meta-learning (MAML) and its variants have become popular approaches for few-shot learning. However, due to the non-convexity of deep neural nets (DNNs) and the bi-level formulation of MAML, the theoretical properties of MAML with DNNs remain largely unknown. In this paper, we first prove that MAML with over-parameterized DNNs is guaranteed to converge to global optima at a linear rate. Our convergence analysis indicates that MAML with over-parameterized DNNs is equivalent to… 

Figures and Tables from this paper

Future Gradient Descent for Adapting the Temporal Shifting Data Distribution in Online Recommendation Systems

This work proposes to learn a meta future gradient generator that forecasts the gradient information of the future data distribution for training so that the recommendation model can be trained as if it were able to look ahead at the future of its deployment.

Meta-ticket: Finding optimal subnetworks for few-shot learning within randomly initialized neural networks

This work proposes a novel meta-learning approach, called Meta-ticket, to find optimal sparse subnetworks for few-shot learning within randomly initialized NNs that achieves superior metageneralization compared to MAML-based methods especially with large NNs.

Provable Domain Generalization via Invariant-Feature Subspace Recovery

Empirically, both ISRs can obtain superior performance compared with IRM on synthetic benchmarks and can be used as simple yet effective post-processing methods to improve the worst-case accuracy of (pre-)trained models against spurious correlations and group shifts.



Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?

It is shown that a simple baseline: learning a supervised or self-supervised representation on the meta-training set, followed by training a linear classifier on top of this representation, outperforms state-of-the-art few-shot learning methods.

Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective

This work proposes a novel framework called training-free neural architecture search (TE-NAS), which ranks architectures by analyzing the spectrum of the neural tangent kernel (NTK) and the number of linear regions in the input space and shows that these two measurements imply the trainability and expressivity of a neural network.

Meta-Learning of Neural Architectures for Few-Shot Learning

The proposed MetaNAS is the first method which fully integrates NAS with gradient-based meta-learning, and optimizes a meta-architecture along with the meta-weights during meta-training, which can be adapted to a novel task with a few steps of the task optimizer.

Towards Fast Adaptation of Neural Architectures with Meta Learning

A novel Transferable Neural Architecture Search method based on meta-learning, which learns a meta-architecture that is able to adapt to a new task quickly through a few gradient steps, which makes the transferred architecture suitable for the specific task.

Meta-Learning With Differentiable Convex Optimization

The objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories and this work exploits two properties of linear classifiers: implicit differentiation of the optimality conditions of the convex problem and the dual formulation of the optimization problem.

Wide neural networks of any depth evolve as linear models under gradient descent

This work shows that for wide NNs the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.

DARTS: Differentiable Architecture Search

The proposed algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques.

Neural tangent kernel: convergence and generalization in neural networks (invited paper)

This talk will introduce this formalism and give a number of results on the Neural Tangent Kernel and explain how they give us insight into the dynamics of neural networks during training and into their generalization features.

Auto-Meta: Automated Gradient Based Meta Learner Search

This work verifies that automated architecture search synergizes with the effect of gradient-based meta learning, and adopts the progressive neural architecture search to find optimal architectures for meta-learners.