Learning to Learn with Generative Models of Neural Network Checkpoints

  title={Learning to Learn with Generative Models of Neural Network Checkpoints},
  author={William S. Peebles and Ilija Radosavovic and Tim Brooks and Alexei A. Efros and Jitendra Malik},
We explore a data-driven approach for learning to optimize neural networks. We construct a dataset of neural network checkpoints and train a generative model on the parameters. In particular, our model is a conditional diffusion transformer that, given an initial input parameter vector and a prompted loss, error, or return, predicts the distribution over parameter updates that achieve the desired metric. At test time, it can optimize neural networks with unseen parameters for downstream tasks… 

Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights

The proposed layer-wise loss normalization is demonstrated to be key to generate high-performing models and several sampling methods based on the topology of hyper-representations are demonstrated to outperform strong baselines as evaluated on several downstream tasks: initialization, ensemble sampling and transfer learning.

Model Zoos: A Dataset of Diverse Populations of Neural Network Models

A novel dataset of model zoos containing systematically generated and diverse populations of NN models for further research is published and an in-depth analysis of the zoos is provided and benchmarks for multiple downstream tasks are provided.

HyperTuning: Toward Adapting Large Language Models without Back-propagation

This work proposes HyperTuning, a novel approach to model adaptation that uses a hypermodel to generate task-specific parameters for a fixed downstream model and shows that using hypermodel-generated parameters as initializations for further parameter-efflcient ffne-tuning improves performance.

Scalable Diffusion Models with Transformers

The largest DiT-XL/2 models outperform all prior diffusion models on the class-conditional ImageNet 512 × 512 and 256 × 256 benchmarks, achieving a state-of-the-art FID of 2.27 on the latter.

InstructPix2Pix: Learning to Follow Image Editing Instructions

The conditional diffusion model, InstructPix2Pix, is trained on generated data, and generalizes to real images and user-written instructions at inference time, and shows compelling editing results for a diverse collection of input images and written instructions.

Soft Diffusion: Score Matching for General Corruptions

If the authors use the exact pipeline and replace their Soft Score matching Loss with the loss introduced in Bansal et al. (2022), FID score increases and the effectiveness of the Soft Score Matching objective is shown.

Equivariant Architectures for Learning in Deep Weight Spaces

A novel network architecture for learning in deep weight spaces that takes as input a concatenation of weights and biases of a pre-trained MLP and processes it using a composition of layers that are equivariant to the natural permutation symmetry of the MLP’s weights.



Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning

Parameter Prediction for Unseen Deep Architectures

This work proposes a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU, and learns a strong representation of neural architectures enabling their analysis.

A Generative Model for Sampling High-Performance and Diverse Weights for Neural Networks

This work trains a neural network that serves as a hypernetwork, mapping a latent vector into high-performance (low-loss) weight vectors, generalizing recent findings of mode connectivity to higher dimensional manifolds.

Reinforcement Learning for Learning Rate Control

An algorithm to automatically learn learning rates using neural network based actor-critic methods from deep reinforcement learning (RL) is proposed, which leads to better convergence of SGD than human-designed competitors.

Learned Optimizers that Scale and Generalize

This work introduces a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead, by introducing a novel hierarchical RNN architecture with minimal per-parameter overhead.

Meta-learning with backpropagation

One of the systems, based on the long short-term memory neural network developed a learning algorithm that could learn any two-dimensional quadratic function (from a set of such functions) after only 30 training examples.

Optimization as a Model for Few-Shot Learning

Learning an Adaptive Learning Rate Schedule

This paper proposes a reinforcement learning based framework that can automatically learn an adaptive learning rate schedule by leveraging the information from past training histories, and dynamically changes based on the current training dynamics.

Learning to Optimize Neural Nets

An extension to Learning to Optimize is developed that is suited to learning optimization algorithms in this setting and it is demonstrated that the learned optimization algorithm consistently outperforms other known optimization algorithms even on unseen tasks and is robust to changes in stochasticity of gradients and the neural net architecture.

Understanding and correcting pathologies in the training of learned optimizers

This work proposes a training scheme which overcomes both of these difficulties, by dynamically weighting two unbiased gradient estimators for a variational loss on optimizer performance, allowing us to train neural networks to perform optimization of a specific task faster than tuned first-order methods.