Corpus ID: 3489117

SMASH: One-Shot Model Architecture Search through HyperNetworks

@article{Brock2018SMASHOM,
  title={SMASH: One-Shot Model Architecture Search through HyperNetworks},
  author={Andrew Brock and Theodore Lim and James M. Ritchie and Nick Weston},
  journal={ArXiv},
  year={2018},
  volume={abs/1708.05344}
}
Designing architectures for deep neural networks requires expert knowledge and substantial computation time. [...] Key Method To facilitate this search, we develop a flexible mechanism based on memory read-writes that allows us to define a wide range of network connectivity patterns, with ResNet, DenseNet, and FractalNet blocks as special cases. We validate our method (SMASH) on CIFAR-10 and CIFAR-100, STL-10, ModelNet10, and Imagenet32x32, achieving competitive performance with similarly-sized hand-designed…Expand
Efficient Neural Architecture Search via Parameter Sharing
TLDR
Efficient Neural Architecture Search is a fast and inexpensive approach for automatic model design that establishes a new state-of-the-art among all methods without post-training processing and delivers strong empirical performances using much fewer GPU-hours.
Understanding and Simplifying One-Shot Architecture Search
TLDR
With careful experimental analysis, it is shown that it is possible to efficiently identify promising architectures from a complex search space without either hypernetworks or reinforcement learning controllers.
BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
TLDR
The proposed BigNAS, an approach that challenges the conventional wisdom that post-processing of the weights is necessary to get good prediction accuracies, is proposed, able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs.
Task-Adaptive Neural Network Search with Meta-Contrastive Learning
TLDR
Given a model-zoo that consists of network pretrained on diverse datasets, the results show that the TANS method instantly retrieves networks that outperform models obtained with the baselines with significantly fewer training steps to reach the target performance, thus minimizing the total cost of obtaining a task-optimal network.
Neural Architecture Search for a Highly Efficient Network with Random Skip Connections
TLDR
This work has devised a novel cell structure that required less memory and computational power than the structures of long short-term memories (LSTMs), and performed a special initialization scheme on the cell parameters, which could permit unhindered gradient propagation on the time axis at the beginning of training.
Neural Architecture Search by Estimation of Network Structure Distributions
TLDR
The algorithm is shown to discover non-regular models which cannot be expressed via blocks, but are competitive both in accuracy and computational cost, while not utilizing complex dataflows or advanced training techniques, as well as remaining conceptually simple and highly extensible.
Learning to Search Efficient DenseNet with Layer-wise Pruning
TLDR
This work designs a reinforcement learning framework to search for efficient DenseNet architectures with layer-wise pruning (LWP) for different tasks, while retaining the original advantages of Dense net, such as feature reuse, short paths, etc.
Graph HyperNetworks for Neural Architecture Search
TLDR
The GHN is proposed to amortize the search cost: given an architecture, it directly generates the weights by running inference on a graph neural network, which can predict network performance more accurately than regular hypernetworks and premature early stopping.
Disentangled Neural Architecture Search
TLDR
This paper proposes disentangled neural architecture search (DNAS) which disentangles the hidden representation of the controller into semantically meaningful concepts, making the neural architectureSearch process interpretable, and proposes a dense-sampling strategy to conduct a targeted search in promising regions that may generate well-performing architectures.
Parameter Prediction for Unseen Deep Architectures
TLDR
This work proposes a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU, and learns a strong representation of neural architectures enabling their analysis.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 43 REFERENCES
Swapout: Learning an ensemble of deep architectures
TLDR
This work describes Swapout, a new stochastic training method that outperforms ResNets of identical network structure yielding impressive results on CIFAR-10 and CIFar-100 and proposes a parameterization that reveals connections to exiting architectures and suggests a much richer set of architectures to be explored.
Neural Architecture Search with Reinforcement Learning
TLDR
This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.
HyperNetworks
This work explores hypernetworks: an approach of using one network, also known as a hypernetwork, to generate the weights for another network. We apply hypernetworks to generate adaptive weights for
Designing Neural Network Architectures using Reinforcement Learning
TLDR
MetaQNN is introduced, a meta-modeling algorithm based on reinforcement learning to automatically generate high-performing CNN architectures for a given learning task that beat existing networks designed with the same layer types and are competitive against the state-of-the-art methods that use more complex layer types.
Wide Residual Networks
TLDR
This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts.
Learning multiple visual domains with residual adapters
TLDR
This paper develops a tunable deep network architecture that, by means of adapter residual modules, can be steered on the fly to diverse visual domains and introduces the Visual Decathlon Challenge, a benchmark that evaluates the ability of representations to capture simultaneously ten very differentVisual domains and measures their ability to recognize well uniformly.
Dynamic Filter Networks
TLDR
The Dynamic Filter Network is introduced, where filters are generated dynamically conditioned on an input, and it is shown that this architecture is a powerful one, with increased flexibility thanks to its adaptive nature, yet without an excessive increase in the number of model parameters.
An Analysis of Single-Layer Networks in Unsupervised Feature Learning
TLDR
The results show that large numbers of hidden nodes and dense feature extraction are critical to achieving high performance—so critical, in fact, that when these parameters are pushed to their limits, they achieve state-of-the-art performance on both CIFAR-10 and NORB using only a single layer of features.
Learning Transferable Architectures for Scalable Image Recognition
TLDR
This paper proposes to search for an architectural building block on a small dataset and then transfer the block to a larger dataset and introduces a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models.
Net2Net: Accelerating Learning via Knowledge Transfer
TLDR
The Net2Net technique accelerates the experimentation process by instantaneously transferring the knowledge from a previous network to each new deeper or wider network, and demonstrates a new state of the art accuracy rating on the ImageNet dataset.
...
1
2
3
4
5
...