Conflicting Bundles: Adapting Architectures Towards the Improved Training of Deep Neural Networks

@article{Peer2021ConflictingBA,
  title={Conflicting Bundles: Adapting Architectures Towards the Improved Training of Deep Neural Networks},
  author={David Peer and Sebastian Stabinger and Antonio Jose Rodr{\'i}guez-S{\'a}nchez},
  journal={2021 IEEE Winter Conference on Applications of Computer Vision (WACV)},
  year={2021},
  pages={256-265}
}
Designing neural network architectures is a challenging task and knowing which specific layers of a model must be adapted to improve the performance is almost a mystery. In this paper, we introduce a novel theory and metric to identify layers that decrease the test accuracy of the trained models, this identification is done as early as at the beginning of training. In the worst-case, such a layer could lead to a network that can not be trained at all. More precisely, we identified those layers… Expand

Figures and Tables from this paper

Orchid: Building Dynamic Test Oracles with Training Bias for Improving Deep Neural Network Models †
  • Haipeng Wang, W. K. Chan
  • 2021 8th International Conference on Dependable Systems and Their Applications (DSA)
  • 2021
The accuracy of deep neural network models is always a top priority in developing these models. One problem to affect it is to what extent such a model can resolve training samples conflicting withExpand
Training Deep Capsule Networks with Residual Connections
TLDR
This paper proposes a methodology to train deeper capsule networks using residual connections, which is evaluated on four datasets and three different routing algorithms and shows that in fact, performance increases when training deeper capsules networks. Expand
Greedy Layer Pruning: Decreasing Inference Time of Transformer Models
Fine-tuning transformer models after unsupervised pre-training reaches a very high performance on many different NLP tasks. Unfortunately, transformers suffer from long inference times which greatlyExpand

References

SHOWING 1-10 OF 34 REFERENCES
Data-dependent Initializations of Convolutional Neural Networks
TLDR
This work presents a fast and simple data-dependent initialization procedure, that sets the weights of a network such that all units in the network train at roughly the same rate, avoiding vanishing or exploding gradients. Expand
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Expand
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
TLDR
A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet. Expand
On the Expressive Power of Deep Neural Networks
We propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute.Expand
The Shattered Gradients Problem: If resnets are the answer, then what is the question?
TLDR
It is shown that the correlation between gradients in standard feedforward networks decays exponentially with depth resulting in gradients that resemble white noise whereas, in contrast, thegradients in architectures with skip-connections are far more resistant to shattering, decaying sublinearly. Expand
Understanding the difficulty of training deep feedforward neural networks
TLDR
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. Expand
Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach
TLDR
It is proved that, when ReLU is the only non-linearity, the loss curvature is immune to class-dependent label noise, and it is shown how one can estimate these probabilities, adapting a recent technique for noise estimation to the multi-class setting, and providing an end-to-end framework. Expand
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand
Residual Networks Behave Like Ensembles of Relatively Shallow Networks
TLDR
This work proposes a novel interpretation of residual networks showing that they can be seen as a collection of many paths of differing length, and reveals one of the key characteristics that seem to enable the training of very deep networks: Residual networks avoid the vanishing gradient problem by introducing short paths which can carry gradient throughout the extent of veryDeep networks. Expand
Highway Networks
TLDR
A new architecture designed to ease gradient-based training of very deep networks, characterized by the use of gating units which learn to regulate the flow of information through a network is introduced. Expand
...
1
2
3
4
...