# Continuous-Time Meta-Learning with Forward Mode Differentiation

@article{Deleu2022ContinuousTimeMW, title={Continuous-Time Meta-Learning with Forward Mode Differentiation}, author={Tristan Deleu and David Kanaa and Leo Feng and Giancarlo Kerg and Yoshua Bengio and Guillaume Lajoie and Pierre-Luc Bacon}, journal={ArXiv}, year={2022}, volume={abs/2203.01443} }

Drawing inspiration from gradient-based meta-learning methods with infinitely small gradient steps, we introduce Continuous-Time Meta-Learning (COMLN), a meta-learning algorithm where adaptation follows the dynamics of a gradient vector field. Specifically, representations of the inputs are meta-learned such that a taskspecific linear classifier is obtained as a solution of an ordinary differential equation (ODE). Treating the learning process as an ODE offers the notable advantage that the…

## 3 Citations

### Gradient-based Bi-level Optimization for Deep Learning: A Survey

- Computer ScienceArXiv
- 2022

This survey illustrates how to formulate a research problem as a bi-level optimization problem, which is of great practical use for beginners and points out the great potential of gradient-based bi- level optimization on science problems (AI4Science).

### Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules

- Computer ScienceArXiv
- 2022

This work introduces a novel combination of learning rules and Neural ODEs to build continuous-time sequence processing nets that learn to manipulate short-term memory in rapidly changing synaptic connections of other nets.

### Influencing Long-Term Behavior in Multiagent Reinforcement Learning

- Computer ScienceArXiv
- 2022

A principled framework for considering the limiting policies of other agents as time approaches in this paper and a new optimization objective that maximizes each agent’s average reward by directly accounting for the impact of its behavior on the limiting set of policies that other agents will converge to is proposed.

## References

SHOWING 1-10 OF 84 REFERENCES

### Meta Learning in the Continuous Time Limit

- Computer ScienceAISTATS
- 2021

The ordinary differential equation (ODE) that underlies the training dynamics of Model-Agnostic Meta-Learning is established and a new BI-MAML training algorithm is proposed that significantly reduces the computational burden associated with existing MAMLTraining methods.

### Meta-Learning with Neural Tangent Kernels

- Computer ScienceICLR
- 2021

This paper proposes the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model’s Neural Tangent Kernel (NTK), and introduces two meta- learning algorithms in the RKHS, which no longer need a sub-optimal iterative inner-loop adaptation.

### Meta-Learning with Adjoint Methods

- Computer ScienceArXiv
- 2021

Adjoint MAML (A-MAML) is proposed, which views gradient descent in the inner optimization as the evolution of an Ordinary Differential Equation (ODE) and optimizes the training loss of the sampled tasks via gradient descent.

### Meta-Learning with Implicit Gradients

- Computer ScienceNeurIPS
- 2019

Theoretically, it is proved that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost.

### Meta-learning with differentiable closed-form solvers

- Computer ScienceICLR
- 2019

The main idea is to teach a deep network to use standard machine learning tools, such as ridge regression, as part of its own internal model, enabling it to quickly adapt to novel data.

### Personalized Algorithm Generation: A Case Study in Meta-Learning ODE Integrators

- Computer ScienceArXiv
- 2021

Overall, this work demonstrates an e-ective, learning-based approach to the design of algorithms for the numerical solution of diﬀerential equations, an approach that can be readily extended to other numerical tasks.

### Meta-Learning with Warped Gradient Descent

- Computer ScienceICLR
- 2020

WarpGrad meta-learns an efficiently parameterised preconditioning matrix that facilitates gradient descent across the task distribution and is computationally efficient, easy to implement, and can scale to arbitrarily large meta-learning problems.

### Model-Agnostic Meta-Learning using Runge-Kutta Methods

- Computer ScienceArXiv
- 2019

The model-agnostic meta-learning framework introduced by Finn et al. (2017) is extended to achieve improved performance by analyzing the temporal dynamics of the optimization procedure via the Runge-Kutta method and it is demonstrated that there are multiple principled ways to update MAML.

### Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

- Computer ScienceICML
- 2017

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning…

### Forward and Reverse Gradient-Based Hyperparameter Optimization

- Computer ScienceICML
- 2017

We study two procedures (reverse-mode and forward-mode) for computing the gradient of the validation error with respect to the hyperparameters of any iterative learning algorithm such as stochastic…