• Corpus ID: 239768239

Parameter Prediction for Unseen Deep Architectures

@article{Knyazev2021ParameterPF,
  title={Parameter Prediction for Unseen Deep Architectures},
  author={Boris Knyazev and Michal Drozdzal and Graham W. Taylor and Adriana Romero-Soriano},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.13100}
}
Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient. We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of diverse computational graphs of neural architectures – DEEPNETS-1M– and use it to explore parameter… 

Pretraining a Neural Network before Knowing Its Architecture

It is found that for recent architectures such as ConvNeXt, GHN initialization becomes less useful than for ResNet-50, and the predicted parameters lack the diversity necessary to success-fully fine-tune parameters with gradient descent.

Meta-Ensemble Parameter Learning

WeightFormer is introduced, a Transformer-based model that can predict student network weights layer by layer in a forward pass, according to the teacher model parameters, and can be straightforwardly extended to handle unseen teacher models compared with knowledge distillation and even exceeds average ensemble with small-scale tuning.

Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights

The proposed layer-wise loss normalization is demonstrated to be key to generate high-performing models and several sampling methods based on the topology of hyper-representations are demonstrated to outperform strong baselines as evaluated on several downstream tasks: initialization, ensemble sampling and transfer learning.

Automating Neural Architecture Design without Search

The automated architecture design is studied from a new perspective that eliminates the need to sequentially evaluate each neural architecture generated during algorithm execution and can potentially lead to a new, more computationally efficient paradigm in this research direction.

Learning to Learn with Generative Models of Neural Network Checkpoints

This model is a conditional diffusion transformer that, given an initial input parameter vector and a prompted loss, error, or return, predicts the distribution over parameter updates that achieve the desired metric.

NeRN - Learning Neural Representations for Neural Networks

This work shows that, when adapted correctly, neural representations can be used to directly represent the weights of a pre-trained convolutional neural network, resulting in a Neural Representation for Neural Networks (NeRN).

Model Zoos: A Dataset of Diverse Populations of Neural Network Models

A novel dataset of model zoos containing systematically generated and diverse populations of NN models for further research is published and an in-depth analysis of the zoos is provided and benchmarks for multiple downstream tasks are provided.

Tutorial on amortized optimization for learning to optimize over continuous domains

This tutorial discusses the key design choices behind amortized optimization, roughly categorizing models into fully-amortized and semi-Amortized approaches, and learning methods into regression-based and objectivebased approaches.

One Hyper-Initializer for All Network Architectures in Medical Image Analysis

An architecture-irrelevant hyper-initializer, which can initialize any given network architecture well after being pre-trained for only once, and is proved that the proposed algorithm can be reused as a favorable plug-and-play initializer for any downstream architecture and task of the same modality.

Teaching Networks to Solve Optimization Problems

This paper proposes to replace the iterative solvers altogether with a trainable parametric set function that outputs the optimal arguments/parameters of an optimization problem in a single feed-forward.

References

SHOWING 1-10 OF 120 REFERENCES

Neural Architecture Search with Reinforcement Learning

This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.

Learned Optimizers that Scale and Generalize

This work introduces a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead, by introducing a novel hierarchical RNN architecture with minimal per-parameter overhead.

Accelerating Neural Architecture Search using Performance Prediction

Standard frequentist regression models can predict the final performance of partially trained model configurations using features based on network architectures, hyperparameters, and time-series validation performance data and an early stopping method is proposed, which obtains a speedup of a factor up to 6x in both hyperparameter optimization and meta-modeling.

SMASH: One-Shot Model Architecture Search through HyperNetworks

A technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model's architecture is proposed, achieving competitive performance with similarly-sized hand-designed networks.

Optimization as a Model for Few-Shot Learning

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning

MetaInit: Initializing learning by learning to initialize

This work introduces an algorithm called MetaInit, based on a hypothesis that good initializations make gradient descent easier by starting in regions that look locally linear with minimal second order effects, which minimizes this quantity efficiently by using gradient descent to tune the norms of the initial weight matrices.

Wide Residual Networks

This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts.

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training

GradInit is an automated and architecture agnostic method for initializing neural networks based on a simple heuristic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.

BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models

The proposed BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies, is proposed, able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs.
...