• Corpus ID: 235417119

A Distribution-dependent Analysis of Meta Learning

@inproceedings{Konobeev2021ADA,
  title={A Distribution-dependent Analysis of Meta Learning},
  author={Mikhail Konobeev and Ilja Kuzborskij and Csaba Szepesvari},
  booktitle={ICML},
  year={2021}
}
A key problem in the theory of meta-learning is to understand how the task distributions influence transfer risk, the expected error of a metalearner on a new task drawn from the unknown task distribution. In this paper, focusing on fixed design linear regression with Gaussian noise and a Gaussian task (or parameter) distribution, we give distribution-dependent lower bounds on the transfer risk of any algorithm, while we also show that a novel, weighted version of the so-called biased… 

Figures from this paper

Meta Learning MDPs with Linear Transition Models
TLDR
It is proved that the proposed biased version of the UC-MatrixRL algorithm provides significant improvements in the transfer regret for task distributions of low variance and high bias compared to learning the tasks in isolation.

References

SHOWING 1-10 OF 23 REFERENCES
Few-Shot Learning via Learning the Representation, Provably
TLDR
The results demonstrate representation learning can fully utilize all $n_1T$ samples from source tasks and the advantage of representation learning in both high-dimensional linear regression and neural network learning.
Provable Meta-Learning of Linear Representations
TLDR
This paper provides provably fast, sample-efficient algorithms to address the dual challenges of learning a common set of features from multiple, related tasks and transferring this knowledge to new, unseen tasks, which are central to the general problem of meta-learning.
Theoretical bounds on estimation error for meta-learning
TLDR
Novel information-theoretic lower-bounds on minimax rates of convergence for algorithms that are trained on data from multiple sources and tested on novel data are provided.
Bandit Algorithms
sets of environments and policies respectively and ` : E ×Π→ [0, 1] a bounded loss function. Given a policy π let `(π) = (`(ν1, π), . . . , `(νN , π)) be the loss vector resulting from policy π.
Adaptive Gradient-Based Meta-Learning Methods
TLDR
This approach enables the task-similarity to be learned adaptively, provides sharper transfer-risk bounds in the setting of statistical learning-to-learn, and leads to straightforward derivations of average-case regret bounds for efficient algorithms in settings where thetask-environment changes dynamically or the tasks share a certain geometric structure.
Learning-to-Learn Stochastic Gradient Descent with Biased Regularization
TLDR
A key feature of the results is that, when the number of tasks grows and their variance is relatively small, the learning-to-learn approach has a significant advantage over learning each task in isolation by Stochastic Gradient Descent without a bias term.
Online Meta-Learning
TLDR
This work introduces an online meta-learning setting, which merges ideas from both the aforementioned paradigms to better capture the spirit and practice of continual lifelong learning, and proposes the follow the meta leader algorithm which extends the MAML algorithm to this setting.
Provable Guarantees for Gradient-Based Meta-Learning
TLDR
This paper develops a meta-algorithm bridging the gap between popular gradient-based meta-learning and classical regularization-based multi-task transfer methods, and is the first to simultaneously satisfy good sample efficiency guarantees in the convex setting and generalization bounds that improve with task-similarity.
Learning To Learn Around A Common Mean
TLDR
It is shown that the LTL problem can be reformulated as a Least Squares (LS) problem and a novel meta- algorithm is exploited to efficiently solve it, and a bound for the generalization error of the meta-algorithm is presented, which suggests the right splitting parameter to choose.
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning
...
1
2
3
...