Deciding How to Decide: Dynamic Routing in Artificial Neural Networks
@inproceedings{McGill2017DecidingHT, title={Deciding How to Decide: Dynamic Routing in Artificial Neural Networks}, author={Mason McGill and Pietro Perona}, booktitle={ICML}, year={2017} }
We propose and systematically evaluate three strategies for training dynamically-routed artificial neural networks: graphs of learned transformations through which different input signals may take different paths. Though some approaches have advantages over others, the resulting networks are often qualitatively similar. We find that, in dynamically-routed networks trained to classify images, layers and branches become specialized to process distinct categories of images. Additionally, given a…
Figures from this paper
77 Citations
SkipNet: Learning Dynamic Routing in Convolutional Networks
- Computer ScienceECCV
- 2018
This work introduces SkipNet, a modified residual network, that uses a gating network to selectively skip convolutional blocks based on the activations of the previous layer, and proposes a hybrid learning algorithm that combines supervised learning and reinforcement learning to address the challenges of non-differentiable skipping decisions.
Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning
- Computer ScienceICLR
- 2018
A collaborative multi-agent reinforcement learning (MARL) approach is employed to jointly train the router and function blocks of a routing network, a kind of self-organizing neural network consisting of a router and a set of one or more function blocks.
Dynamic Neural Networks: A Survey
- Computer ScienceIEEE transactions on pattern analysis and machine intelligence
- 2021
This survey comprehensively review this rapidly developing area of dynamic networks by dividing dynamic networks into three main categories: sample-wise dynamic models that process each sample with data-dependent architectures or parameters; spatial-wiseynamic networks that conduct adaptive computation with respect to different spatial locations of image data; and temporal-wise Dynamic networks that perform adaptive inference along the temporal dimension for sequential data.
BlockDrop: Dynamic Inference Paths in Residual Networks
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
BlockDrop, an approach that learns to dynamically choose which layers of a deep network to execute during inference so as to best reduce total computation without degrading prediction accuracy, is introduced.
Anytime Recognition with Routing Convolutional Networks
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2021
This work proposes a new Routing Convolutional Network (RCN), which adaptively selects the optimal layer as exit for a specific testing sample within a specific time budget and can be adapted to dense prediction tasks, e.g., scene parsing, to achieve the pixel-level anytime prediction.
Learning Time-Efficient Deep Architectures with Budgeted Super Networks
- Computer ScienceArXiv
- 2017
This work proposes a new family of models called Budgeted Super Networks that are learned using reinforcement-learning inspired techniques applied to a budgeted learning objective function which includes the computation cost during disk/memory operations at inference.
HydraNets: Specialized Dynamic Architectures for Efficient Inference
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This paper proposes a network architecture template called HydraNet, which enables state-of-the-art architectures for image classification to be transformed into dynamic architectures which exploit conditional execution for efficient inference.
Anytime Neural Prediction via Slicing Networks Vertically
- Computer ScienceArXiv
- 2018
This work first builds many inclusive thin sub-networks (of the same depth) under a minor modification of existing multi-branch DNNs, and finds that they can significantly outperform the state-of-art dense architecture for anytime prediction.
Deep networks with probabilistic gates
- Computer ScienceArXiv
- 2018
This work proposes a per-batch loss function, and describes strategies for handling probabilistic bypass during inference as well as training, and explores several inference-time strategies, including the natural MAP approach.
Reducing Catastrophic Forgetting in Modular Neural Networks by Dynamic Information Balancing
- Computer ScienceArXiv
- 2019
This work investigates how to exploit modular topology in neural networks in order to dynamically balance the information load between different modules by routing inputs based on the information content in each module so that information interference is minimized.
References
SHOWING 1-10 OF 33 REFERENCES
Deep Sequential Neural Network
- Computer ScienceNIPS 2014
- 2014
A new neural network model where each layer is associated with a set of candidate mappings that is able to process data with different characteristics through specific sequences of such local transformations, increasing the expression power of this model w.r.t a classical multilayered network.
Decision Forests, Convolutional Networks and the Models in-Between
- Computer ScienceArXiv
- 2016
This paper investigates the connections between two state of the art classifiers: decision forests (DFs, including decision jungles) and convolutional neural networks (CNNs) to achieve a continuum of hybrid models with different ratios of accuracy vs. efficiency.
Training Very Deep Networks
- Computer ScienceNIPS
- 2015
A new architecture designed to overcome the challenges of training very deep networks, inspired by Long Short-Term Memory recurrent networks, which allows unimpeded information flow across many layers on information highways.
Deep Neural Decision Forests
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
A novel approach that unifies classification trees with the representation learning functionality known from deep convolutional networks, by training them in an end-to-end manner by introducing a stochastic and differentiable decision tree model.
Neural Decision Forests for Semantic Image Labelling
- Computer Science2014 IEEE Conference on Computer Vision and Pattern Recognition
- 2014
This work introduces randomized Multi- Layer Perceptrons (rMLP) as new split nodes which are capable of learning non-linear, data-specific representations and taking advantage of them by finding optimal predictions for the emerging child nodes.
Network In Network
- Computer ScienceICLR
- 2014
With enhanced local modeling via the micro network, the proposed deep network structure NIN is able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.
Conditional Computation in Neural Networks for faster models
- Computer ScienceArXiv
- 2015
This paper applies a policy gradient algorithm for learning policies that optimize this loss function and proposes a regularization mechanism that encourages diversification of the dropout policy and presents encouraging empirical results showing that this approach improves the speed of computation without impacting the quality of the approximation.
FlowNet: Learning Optical Flow with Convolutional Networks
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
This paper constructs CNNs which are capable of solving the optical flow estimation problem as a supervised learning task, and proposes and compares two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations.
Understanding the difficulty of training deep feedforward neural networks
- Computer ScienceAISTATS
- 2010
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.
Multigrid Neural Architectures
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
Together, the results suggest that continuous evolution of features on a multigrid pyramid is a more powerful alternative to existing CNN designs on a flat grid.