• Corpus ID: 240420006

Meta-Learning to Improve Pre-Training

  title={Meta-Learning to Improve Pre-Training},
  author={Aniruddh Raghu and Jonathan Lorraine and Simon Kornblith and Matthew B. A. McDermott and David Kristjanson Duvenaud},
Pre-training (PT) followed by fine-tuning (FT) is an effective method for training neural networks, and has led to significant performance improvements in many domains. PT can incorporate various design choices such as task and data reweighting strategies, augmentation policies, and noise models, all of which can significantly impact the quality of representations learned. The hyperparameters introduced by these strategies therefore must be tuned appropriately. However, setting the values of… 

Tables from this paper

Data Augmentation for Electrocardiograms
A new method is introduced, TaskAug, which defines a flexible augmentation policy that is optimized on a per-task basis and is competitive with or improves on prior work, and the learned policies shed light on what transformations are most effective for different tasks.
Lyapunov Exponents for Diversity in Differentiable Games
Theoretical motivation for the method is given by leveraging machinery from the field of dynamical systems, and it is empirically evaluated by finding diverse solutions in the iterated prisoners’ dilemma and relevant machine learning problems including generative adversarial networks.
Cluster Head Detection for Hierarchical UAV Swarm With Graph Self-supervised Learning
A multi-cluster graph attention self-supervised learning algorithm (MC-GASSL) based on the GASSL can efficiently detect all the HUAVs in USNETs with various IFSs and cluster numbers with low detection redundancies.


Learning to Reweight Examples for Robust Deep Learning
This work proposes a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions that can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning
Meta-Learning with Latent Embedding Optimization
This work shows that latent embedding optimization can achieve state-of-the-art performance on the competitive miniImageNet and tieredImageNet few-shot classification tasks, and indicates LEO is able to capture uncertainty in the data, and can perform adaptation more effectively by optimizing in latent space.
Fast Context Adaptation via Meta-Learning
It is shown empirically that CAVIA outperforms MAML on regression, classification, and reinforcement learning problems and is easier to implement, and is more robust to the inner-loop learning rate.
Probabilistic Model-Agnostic Meta-Learning
This paper proposes a probabilistic meta-learning algorithm that can sample models for a new task from a model distribution that is trained via a variational lower bound, and shows how reasoning about ambiguity can also be used for downstream active learning problems.
Meta-Learning with Implicit Gradients
Theoretically, it is proved that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost.
Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions
This work aims to adapt regularization hyperparameters for neural networks by fitting compact approximations to the best-response function, which mapshyperparameters to optimal weights and biases, and outperforms competing hyperparameter optimization methods on large-scale deep learning problems.
Optimizing Millions of Hyperparameters by Implicit Differentiation
An algorithm for inexpensive gradient-based hyperparameter optimization that combines the implicit function theorem (IFT) with efficient inverse Hessian approximations is proposed and used to train modern network architectures with millions of weights and millions of hyper-parameters.
RoBERTa: A Robustly Optimized BERT Pretraining Approach
It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Meta-Learning With Differentiable Convex Optimization
The objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories and this work exploits two properties of linear classifiers: implicit differentiation of the optimality conditions of the convex problem and the dual formulation of the optimization problem.