• Corpus ID: 15136849

Deciding How to Decide: Dynamic Routing in Artificial Neural Networks

@inproceedings{McGill2017DecidingHT,
  title={Deciding How to Decide: Dynamic Routing in Artificial Neural Networks},
  author={Mason McGill and Pietro Perona},
  booktitle={ICML},
  year={2017}
}
We propose and systematically evaluate three strategies for training dynamically-routed artificial neural networks: graphs of learned transformations through which different input signals may take different paths. Though some approaches have advantages over others, the resulting networks are often qualitatively similar. We find that, in dynamically-routed networks trained to classify images, layers and branches become specialized to process distinct categories of images. Additionally, given a… 
SkipNet: Learning Dynamic Routing in Convolutional Networks
TLDR
This work introduces SkipNet, a modified residual network, that uses a gating network to selectively skip convolutional blocks based on the activations of the previous layer, and proposes a hybrid learning algorithm that combines supervised learning and reinforcement learning to address the challenges of non-differentiable skipping decisions.
Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning
TLDR
A collaborative multi-agent reinforcement learning (MARL) approach is employed to jointly train the router and function blocks of a routing network, a kind of self-organizing neural network consisting of a router and a set of one or more function blocks.
Dynamic Neural Networks: A Survey
TLDR
This survey comprehensively review this rapidly developing area of dynamic networks by dividing dynamic networks into three main categories: sample-wise dynamic models that process each sample with data-dependent architectures or parameters; spatial-wiseynamic networks that conduct adaptive computation with respect to different spatial locations of image data; and temporal-wise Dynamic networks that perform adaptive inference along the temporal dimension for sequential data.
BlockDrop: Dynamic Inference Paths in Residual Networks
TLDR
BlockDrop, an approach that learns to dynamically choose which layers of a deep network to execute during inference so as to best reduce total computation without degrading prediction accuracy, is introduced.
Anytime Recognition with Routing Convolutional Networks
TLDR
This work proposes a new Routing Convolutional Network (RCN), which adaptively selects the optimal layer as exit for a specific testing sample within a specific time budget and can be adapted to dense prediction tasks, e.g., scene parsing, to achieve the pixel-level anytime prediction.
Learning Time-Efficient Deep Architectures with Budgeted Super Networks
TLDR
This work proposes a new family of models called Budgeted Super Networks that are learned using reinforcement-learning inspired techniques applied to a budgeted learning objective function which includes the computation cost during disk/memory operations at inference.
HydraNets: Specialized Dynamic Architectures for Efficient Inference
TLDR
This paper proposes a network architecture template called HydraNet, which enables state-of-the-art architectures for image classification to be transformed into dynamic architectures which exploit conditional execution for efficient inference.
Anytime Neural Prediction via Slicing Networks Vertically
TLDR
This work first builds many inclusive thin sub-networks (of the same depth) under a minor modification of existing multi-branch DNNs, and finds that they can significantly outperform the state-of-art dense architecture for anytime prediction.
Deep networks with probabilistic gates
TLDR
This work proposes a per-batch loss function, and describes strategies for handling probabilistic bypass during inference as well as training, and explores several inference-time strategies, including the natural MAP approach.
Reducing Catastrophic Forgetting in Modular Neural Networks by Dynamic Information Balancing
TLDR
This work investigates how to exploit modular topology in neural networks in order to dynamically balance the information load between different modules by routing inputs based on the information content in each module so that information interference is minimized.
...
...

References

SHOWING 1-10 OF 33 REFERENCES
Deep Sequential Neural Network
TLDR
A new neural network model where each layer is associated with a set of candidate mappings that is able to process data with different characteristics through specific sequences of such local transformations, increasing the expression power of this model w.r.t a classical multilayered network.
Decision Forests, Convolutional Networks and the Models in-Between
TLDR
This paper investigates the connections between two state of the art classifiers: decision forests (DFs, including decision jungles) and convolutional neural networks (CNNs) to achieve a continuum of hybrid models with different ratios of accuracy vs. efficiency.
Training Very Deep Networks
TLDR
A new architecture designed to overcome the challenges of training very deep networks, inspired by Long Short-Term Memory recurrent networks, which allows unimpeded information flow across many layers on information highways.
Deep Neural Decision Forests
TLDR
A novel approach that unifies classification trees with the representation learning functionality known from deep convolutional networks, by training them in an end-to-end manner by introducing a stochastic and differentiable decision tree model.
Neural Decision Forests for Semantic Image Labelling
TLDR
This work introduces randomized Multi- Layer Perceptrons (rMLP) as new split nodes which are capable of learning non-linear, data-specific representations and taking advantage of them by finding optimal predictions for the emerging child nodes.
Network In Network
TLDR
With enhanced local modeling via the micro network, the proposed deep network structure NIN is able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.
Conditional Computation in Neural Networks for faster models
TLDR
This paper applies a policy gradient algorithm for learning policies that optimize this loss function and proposes a regularization mechanism that encourages diversification of the dropout policy and presents encouraging empirical results showing that this approach improves the speed of computation without impacting the quality of the approximation.
FlowNet: Learning Optical Flow with Convolutional Networks
TLDR
This paper constructs CNNs which are capable of solving the optical flow estimation problem as a supervised learning task, and proposes and compares two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations.
Understanding the difficulty of training deep feedforward neural networks
TLDR
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.
Multigrid Neural Architectures
TLDR
Together, the results suggest that continuous evolution of features on a multigrid pyramid is a more powerful alternative to existing CNN designs on a flat grid.
...
...