Continuous-Time Meta-Learning with Forward Mode Differentiation

  title={Continuous-Time Meta-Learning with Forward Mode Differentiation},
  author={Tristan Deleu and David Kanaa and Leo Feng and Giancarlo Kerg and Yoshua Bengio and Guillaume Lajoie and Pierre-Luc Bacon},
Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Specifically, representations of the inputs are meta-learned such that a taskspecific linear classifier is obtained as a solution of an ordinary differential equation (ODE). Treating the learning process as an ODE offers the notable advantage that the… 

Figures and Tables from this paper

Gradient-based Bi-level Optimization for Deep Learning: A Survey

This survey illustrates how to formulate a research problem as a bi-level optimization problem, which is of great practical use for beginners and points out the great potential of gradient-based bi- level optimization on science problems (AI4Science).

Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules

This work introduces a novel combination of learning rules and Neural ODEs to build continuous-time sequence processing nets that learn to manipulate short-term memory in rapidly changing synaptic connections of other nets.

Influencing Long-Term Behavior in Multiagent Reinforcement Learning

A principled framework for considering the limiting policies of other agents as time approaches in this paper and a new optimization objective that maximizes each agent’s average reward by directly accounting for the impact of its behavior on the limiting set of policies that other agents will converge to is proposed.



Meta Learning in the Continuous Time Limit

The ordinary differential equation (ODE) that underlies the training dynamics of Model-Agnostic Meta-Learning is established and a new BI-MAML training algorithm is proposed that significantly reduces the computational burden associated with existing MAMLTraining methods.

Meta-Learning with Neural Tangent Kernels

This paper proposes the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model’s Neural Tangent Kernel (NTK), and introduces two meta- learning algorithms in the RKHS, which no longer need a sub-optimal iterative inner-loop adaptation.

Meta-Learning with Adjoint Methods

Adjoint MAML (A-MAML) is proposed, which views gradient descent in the inner optimization as the evolution of an Ordinary Differential Equation (ODE) and optimizes the training loss of the sampled tasks via gradient descent.

Meta-Learning with Implicit Gradients

Theoretically, it is proved that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost.

Meta-learning with differentiable closed-form solvers

The main idea is to teach a deep network to use standard machine learning tools, such as ridge regression, as part of its own internal model, enabling it to quickly adapt to novel data.

Personalized Algorithm Generation: A Case Study in Meta-Learning ODE Integrators

Overall, this work demonstrates an e-ective, learning-based approach to the design of algorithms for the numerical solution of differential equations, an approach that can be readily extended to other numerical tasks.

Meta-Learning with Warped Gradient Descent

WarpGrad meta-learns an efficiently parameterised preconditioning matrix that facilitates gradient descent across the task distribution and is computationally efficient, easy to implement, and can scale to arbitrarily large meta-learning problems.

Model-Agnostic Meta-Learning using Runge-Kutta Methods

The model-agnostic meta-learning framework introduced by Finn et al. (2017) is extended to achieve improved performance by analyzing the temporal dynamics of the optimization procedure via the Runge-Kutta method and it is demonstrated that there are multiple principled ways to update MAML.

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning

Forward and Reverse Gradient-Based Hyperparameter Optimization

We study two procedures (reverse-mode and forward-mode) for computing the gradient of the validation error with respect to the hyperparameters of any iterative learning algorithm such as stochastic