Improving the Accuracy of Early Exits in Multi-Exit Architectures via Curriculum Learning

  title={Improving the Accuracy of Early Exits in Multi-Exit Architectures via Curriculum Learning},
  author={Arian Bakhtiarnia and Qi Zhang and Alexandros Iosifidis},
  journal={2021 International Joint Conference on Neural Networks (IJCNN)},
Deploying deep learning services for time-sensitive and resource-constrained settings such as IoT using edge computing systems is a challenging task that requires dynamic adjustment of inference time. Multi-exit architectures allow deep neural networks to terminate their execution early in order to adhere to tight deadlines at the cost of accuracy. To mitigate this cost, in this paper we introduce a novel method called Multi-Exit Curriculum Learning that utilizes curriculum learning, a training… 

Figures and Tables from this paper

Crowd Counting on Heavily Compressed Images with Curriculum Pre-Training

A novel training approach called curriculum pre-training (CPT) for crowd counting on compressed images, which alleviates the drop in accuracy resulting from lossy compression.

Efficient High-Resolution Deep Learning: A Survey

This survey describes suchcient high-resolution deep learning methods, summarizes real- world applications of high- Resolution deep learning, and provides comprehensive information about available high- resolution datasets.

Multi-Exit Vision Transformer for Dynamic Inference

This work proposes seven different architectures for early exit branches that can be used for dynamic inference in Vision Transformer backbones and shows that each one of these architectures could prove useful in the trade-off between accuracy and speed.



A Comprehensive Survey on Curriculum Learning

This article summarizes existing CL designs based on the general framework of Difficulty Measurer + Training Scheduler and further categorize the methodologies for automatic CL into four groups, i.e., Self-paced Learning, Transfer Teacher, RL Teacher, and Other Automatic CL.

Why Should We Add Early Exits to Neural Networks?

This paper provides a comprehensive introduction to this family of neural networks, by describing in a unified fashion the way these architectures can be designed, trained, and actually deployed in time-constrained scenarios.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

Learning Multiple Layers of Features from Tiny Images

It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.

Rethinking the Inception Architecture for Computer Vision

This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

To Improve Service Reliability for AI-Powered Time-Critical Services Using Imperfect Transmission in MEC: An Experimental Study

It is shown that the AI-powered time-critical services can tolerate small image distortion and still remain the inference accuracy, and it is more important to minimize the timeout probability by shortening transmission latency than perfect error-free transmission.

Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference

It is shown experimentally that by equipping existing backbones with such robust adaptive inference, the resulting RDI-Nets can achieve better accuracy and robustness, yet with over 30% computational savings, compared to the defended original models.

Curriculum-based Teacher Ensemble for Robust Neural Network Distillation

It is experimentally demonstrated that the selected teacher can indeed have a significant effect on knowledge transfer and proposed method is motivated by the way that humans learn through a curriculum, as well as supported by recent findings that hints to the existence of critical learning periods in neural networks.

Distillation-Based Training for Multi-Exit Architectures

Experiments show that distillation-based training significantly improves the accuracy of early exits while maintaining state-of-the-art accuracy for late ones and allows a straight-forward extension to semi-supervised learning, i.e. make use also of unlabeled data at training time.