Corpus ID: 204961482

Decoupling Adaptation from Modeling with Meta-Optimizers for Meta Learning

  title={Decoupling Adaptation from Modeling with Meta-Optimizers for Meta Learning},
  author={S{\'e}bastien M. R. Arnold and Shariq Iqbal and Fei Sha},
Meta-learning methods, most notably Model-Agnostic Meta-Learning or MAML, have achieved great success in adapting to new tasks quickly, after having been trained on similar tasks. The mechanism behind their success, however, is poorly understood. We begin this work with an experimental analysis of MAML, finding that deep models are crucial for its success, even given sets of simple tasks where a linear model would suffice on any individual task. Furthermore, on image-recognition tasks, we find… Expand
Multi-Stage Meta-Learning for Few-Shot with Lie Group Network Constraint
This paper proposes a novel meta-learning model called Multi-Stage Meta-Learning (MSML), which constrains a network to Stiefel manifold so that a meta-learner could perform a more stable gradient descent in limited steps so that the adapting process can be accelerated. Expand
Meta-Learning and Representation Change
Model Agnostic Meta-Learning (MAML) is one of the most representative of gradient-based meta-learning algorithms. MAML learns new tasks with a few data samples using inner updates from aExpand
Multi-Stage Meta-Learning for Few-Shot with Lie Group Network Constraint
A novel method by using multi-stage joint training approach to post the bottleneck during adapting process to accelerate adapt procedure and constraint network to Stiefel manifold, thus meta-learner could perform more stable gradient descent in limited steps. Expand
A Sample Complexity Separation between Non-Convex and Convex Meta-Learning
This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. Expand
Modular Meta-Learning with Shrinkage
This work develops general techniques based on Bayesian shrinkage to automatically discover and learn both task-specific and general reusable modules and demonstrates that this method outperforms existing meta-learning approaches in domains like few-shot text-to-speech that have little task data and long adaptation horizons. Expand


Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learningExpand
Meta Learning via Learned Loss
This paper presents a meta-learning method for learning parametric loss functions that can generalize across different tasks and model architectures, and develops a pipeline for “meta-training” such loss functions, targeted at maximizing the performance of the model trained under them. Expand
Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
The ANIL (Almost No Inner Loop) algorithm is proposed, a simplification of MAML where the inner loop is removed for all but the (task-specific) head of a MAMl-trained network, and performance on the test tasks is entirely determined by the quality of the learned features, and one can remove even the head of the network (the NIL algorithm). Expand
Alpha MAML: Adaptive Model-Agnostic Meta-Learning
An extension to MAML is introduced to incorporate an online hyperparameter adaptation scheme that eliminates the need to tune meta-learning and learning rates, and results with the Omniglot database demonstrate a substantial reduction in theneed to tune MAMl training hyperparameters and improvement to training stability with less sensitivity to hyperparam parameter choice. Expand
Meta-learning with differentiable closed-form solvers
The main idea is to teach a deep network to use standard machine learning tools, such as ridge regression, as part of its own internal model, enabling it to quickly adapt to novel data. Expand
Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm
This paper finds that deep representation combined with standard gradient descent have sufficient capacity to approximate any learning algorithm, and finds that gradient-based meta-learning consistently leads to learning strategies that generalize more widely compared to those represented by recurrent models. Expand
Towards Understanding Generalization in Gradient-Based Meta-Learning
It is experimentally demonstrated that as meta-training progresses, the meta-test solutions, obtained after adapting theMeta-train solution of the model, to new tasks via few steps of gradient-based fine-tuning, become flatter, lower in loss, and further away from the meta -train solution. Expand
Optimization as a Model for Few-Shot Learning
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
This work demonstrates that the dimension of this learned subspace reflects the complexity of the task-specific learner's adaptation task, and also that the model is less sensitive to the choice of initial learning rates than previous gradient-based meta-learning methods. Expand
On First-Order Meta-Learning Algorithms
A family of algorithms for learning a parameter initialization that can be fine-tuned quickly on a new task, using only first-order derivatives for the meta-learning updates, including Reptile, which works by repeatedly sampling a task, training on it, and moving the initialization towards the trained weights on that task. Expand