• Corpus ID: 6227528

IDK Cascades: Fast Deep Learning by Learning not to Overthink

@inproceedings{Wang2018IDKCF,
  title={IDK Cascades: Fast Deep Learning by Learning not to Overthink},
  author={Xin Wang and Yujia Luo and Daniel Crankshaw and Alexey Tumanov and Joseph Gonzalez},
  booktitle={UAI},
  year={2018}
}
Advances in deep learning have led to substantial increases in prediction accuracy but have been accompanied by increases in the cost of rendering predictions. We conjecture that fora majority of real-world inputs, the recent advances in deep learning have created models that effectively "overthink" on simple inputs. In this paper, we revisit the classic question of building model cascades that primarily leverage class asymmetry to reduce cost. We introduce the "I Don't Know"(IDK) prediction… 

Figures and Tables from this paper

Why Should We Add Early Exits to Neural Networks?
TLDR
This paper provides a comprehensive introduction to this family of neural networks, by describing in a unified fashion the way these architectures can be designed, trained, and actually deployed in time-constrained scenarios.
Shallow-Deep Networks: Understanding and Mitigating Network Overthinking
TLDR
The Shallow-Deep Network (SDN) is proposed, a generic modification to off-the-shelf DNNs that introduces internal classifiers that can mitigate the wasteful effect of overthinking with confidence-based early exits and reduce the average inference cost by more than 50% and preserve the accuracy.
TAFE-Net: Task-Aware Feature Embeddings for Efficient Learning and Inference.
TLDR
This work proposes Task-Aware Feature Embedding Networks (TAFE-Nets) to learn how to adapt the image representation to a new task in a meta-learning fashion and shows that the TAFE-Net is highly effective in generalizing to new tasks or concepts and offers efficient prediction with low computational cost.
TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning
TLDR
This work proposes Task Aware Feature Embedding Networks (TAFE-Nets) to learn how to adapt the image representation to a new task in a meta learning fashion and shows that TAFE-Net is highly effective in generalizing to new tasks or concepts.
Zero Time Waste: Recycling Predictions in Early Exit Neural Networks
TLDR
This work introduces Zero Time Waste (ZTW), a novel approach in which each IC reuses predictions returned by its predecessors by adding direct connections between ICs and combining previous outputs in an ensemble-like manner.
Differentiable Branching In Deep Networks for Fast Inference
TLDR
This paper proposes a way to jointly optimize this strategy together with the branches, providing an end-to-end trainable algorithm for this emerging class of neural networks, by replacing the original output of the branches with a ‘soft’, differentiable approximation.
How to Stop Off-the-Shelf Deep Neural Networks from Overthinking
TLDR
The overthinking problem is explored, and a generic modification to off-the-shelf DNNs---the Shallow-Deep Network (SDN) is proposed, which can efficiently produce predictions from either shallow or deep layers, as appropriate for the given input.
A Survey on Green Deep Learning
TLDR
This paper focuses on presenting a systematic review of the development of Green deep learning technologies, and classifies these approaches into four categories: (1) compact networks, (2) energy-efficient training strategies, (3)Energy-efficient inference approaches, and (4) efficient data usage.
Deep Mixture of Experts via Shallow Embedding
TLDR
This work explores a mixture of experts (MoE) approach to deep dynamic routing, which activates certain experts in the network on a per-example basis, and shows that Deep-MoEs are able to achieve higher accuracy with lower computation than standard convolutional networks.
...
...

References

SHOWING 1-10 OF 32 REFERENCES
Structured Prediction Cascades
TLDR
It is shown that the learned cascades are capable of reducing the complexity of inference by up to ve orders of magnitude, enabling the use of models which incorporate higher order features and yield higher accuracy.
Multi-Scale Dense Convolutional Networks for Efficient Prediction
TLDR
A new convolutional neural network architecture with the ability to adapt dynamically to computational resource limits at test time and substantially improves the state-of-the-art in both settings is introduced.
NoScope: Optimizing Deep CNN-Based Queries over Video Streams at Scale
TLDR
NoScope is a system for querying videos that can reduce the cost of neural network video analysis by up to three orders of magnitude via inference-optimized model search and achieves two to three order of magnitude speed-ups on binary classification tasks over fixed-angle webcam and surveillance video while maintaining accuracy within 1--5% of state-of-the-art neural networks.
NoScope: Optimizing Neural Network Queries over Video at Scale
TLDR
NoScope is a system for querying videos that can reduce the cost of neural network video analysis by up to three orders of magnitude via inference-optimized model search and achieves two to three order of magnitude speed-ups on binary classification tasks over fixed-angle webcam and surveillance video while maintaining accuracy within 1-5% of state-of-the-art neural networks.
Clipper: A Low-Latency Online Prediction Serving System
TLDR
Clipper is introduced, a general-purpose low-latency prediction serving system that introduces a modular architecture to simplify model deployment across frameworks and applications and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks.
Learning Complexity-Aware Cascades for Deep Pedestrian Detection
TLDR
CompACT cascades are shown to seek an optimal trade-off between accuracy and complexity by pushing features of higher complexity to the later cascade stages, where only a few difficult candidate patches remain to be classified.
Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings
TLDR
The unanimity principle is introduced: only predict when all models consistent with the training data predict the same output, operationalize this principle for semantic parsing, the task of mapping utterances to logical forms.
End-to-End Learning of Driving Models from Large-Scale Video Datasets
TLDR
This work advocates learning a generic vehicle motion model from large scale crowd-sourced video data, and develops an end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion from instantaneous monocular camera observations and previous vehicle state.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Distilling the Knowledge in a Neural Network
TLDR
This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
...
...