• Corpus ID: 204961482

Decoupling Adaptation from Modeling with Meta-Optimizers for Meta Learning

  title={Decoupling Adaptation from Modeling with Meta-Optimizers for Meta Learning},
  author={S{\'e}bastien M. R. Arnold and Shariq Iqbal and Fei Sha},
Meta-learning methods, most notably Model-Agnostic Meta-Learning or MAML, have achieved great success in adapting to new tasks quickly, after having been trained on similar tasks. The mechanism behind their success, however, is poorly understood. We begin this work with an experimental analysis of MAML, finding that deep models are crucial for its success, even given sets of simple tasks where a linear model would suffice on any individual task. Furthermore, on image-recognition tasks, we find… 

Figures and Tables from this paper

Modular Meta-Learning with Shrinkage

This work develops general techniques based on Bayesian shrinkage to automatically discover and learn both task-specific and general reusable modules and demonstrates that this method outperforms existing meta-learning approaches in domains like few-shot text-to-speech that have little task data and long adaptation horizons.

Multi-Stage Meta-Learning for Few-Shot with Lie Group Network Constraint

This paper proposes a novel meta-learning model called Multi-Stage Meta-Learning (MSML), which constrains a network to Stiefel manifold so that a meta-learner could perform a more stable gradient descent in limited steps so that the adapting process can be accelerated.

Meta-Learning and Representation Change

This study investigates the necessity of representation change for the ultimate goal of few-shot learning, which is solving domain-agnostic tasks, and proposes a novel meta-learning algorithm, called BOIL (Body Only update in Inner Loop), which updates only the body (extractor) of the model and freezes the head (classifier) during inner loop updates.

Celebrating Robustness in Efficient Off-Policy Meta-Reinforcement Learning

An off-policy meta-RL algorithm abbreviated as CRL (Celebrating Robustness Learning) that disentangles task-specific policy parameters by an adapter network to shared low-level parameters, learns a probabilistic latent space to extract universal information across different tasks and perform temporal-extended exploration.

A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory.



Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning

Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML

The ANIL (Almost No Inner Loop) algorithm is proposed, a simplification of MAML where the inner loop is removed for all but the (task-specific) head of a MAMl-trained network, and performance on the test tasks is entirely determined by the quality of the learned features, and one can remove even the head of the network (the NIL algorithm).

Alpha MAML: Adaptive Model-Agnostic Meta-Learning

An extension to MAML is introduced to incorporate an online hyperparameter adaptation scheme that eliminates the need to tune meta-learning and learning rates, and results with the Omniglot database demonstrate a substantial reduction in theneed to tune MAMl training hyperparameters and improvement to training stability with less sensitivity to hyperparam parameter choice.

Meta-learning with differentiable closed-form solvers

The main idea is to teach a deep network to use standard machine learning tools, such as ridge regression, as part of its own internal model, enabling it to quickly adapt to novel data.

Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm

This paper finds that deep representation combined with standard gradient descent have sufficient capacity to approximate any learning algorithm, and finds that gradient-based meta-learning consistently leads to learning strategies that generalize more widely compared to those represented by recurrent models.

Towards Understanding Generalization in Gradient-Based Meta-Learning

It is experimentally demonstrated that as meta-training progresses, the meta-test solutions, obtained after adapting theMeta-train solution of the model, to new tasks via few steps of gradient-based fine-tuning, become flatter, lower in loss, and further away from the meta -train solution.

Optimization as a Model for Few-Shot Learning

Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace

This work demonstrates that the dimension of this learned subspace reflects the complexity of the task-specific learner's adaptation task, and also that the model is less sensitive to the choice of initial learning rates than previous gradient-based meta-learning methods.

On First-Order Meta-Learning Algorithms

A family of algorithms for learning a parameter initialization that can be fine-tuned quickly on a new task, using only first-order derivatives for the meta-learning updates, including Reptile, which works by repeatedly sampling a task, training on it, and moving the initialization towards the trained weights on that task.

Meta-SGD: Learning to Learn Quickly for Few Shot Learning

Meta-SGD, an SGD-like, easily trainable meta-learner that can initialize and adapt any differentiable learner in just one step, shows highly competitive performance for few-shot learning on regression, classification, and reinforcement learning.