On a Sparse Shortcut Topology of Artificial Neural Networks

  title={On a Sparse Shortcut Topology of Artificial Neural Networks},
  author={Fenglei Fan and Dayang Wang and Hengtao Guo and Qikui Zhu and Pingkun Yan and Ge Wang and Hengyong Yu},
  journal={IEEE Transactions on Artificial Intelligence},
In established network architectures, shortcut connections are often used to take the outputs of earlier layers as additional inputs to later layers. Despite the extraordinary effectiveness of shortcuts, there remain open questions on the mechanism and characteristics. For example, why are shortcuts powerful? Why do shortcuts generalize well? In this article, we investigate the expressivity and generalizability of a novel sparse shortcut topology. First, we demonstrate that this topology can… 

SQ-Swin: a Pretrained Siamese Quadratic Swin Transformer for Lettuce Browning Prediction

A deep learning model for lettuce browning prediction using a pretrained Siamese Quadratic Swin (SQ-Swin) transformer which is the first of its kind and outperforms the traditional methods and other deep learning-based backbones.

On Expressivity and Training of Quadratic Networks

An effective and effective training strategy referred to as ReLinear is proposed to stabilize the training process of a quadratic network, thereby unleashing the full potential in its associated machine learning tasks.

CTformer: Convolution-free Token2Token Dilated Vision Transformer for Low-dose CT Denoising

Low-dose computed tomography (LDCT) denoising is an important problem in CT research. Compared to the normal dose CT (NDCT), LDCT images are subjected to severe noise and artifacts. Recently in many

Neural Network Gaussian Processes by Increasing Depth

This work uses a shortcut network to show that increasing the depth of a neural network can also give rise to a Gaussian process, which is a valuable addition to the existing theory and contributes to revealing the true picture of deep learning.

Expressivity and Trainability of Quadratic Networks

An effective and effective training strategy referred to as ReLinear is proposed to stabilize the training process of a quadratic network, thereby unleashing the full potential in its associated machine learning tasks.

Error Bounds for a Matrix-Vector Product Approximation with Deep ReLU Neural Networks

The derived error bounds offer a theoretical insight and guarantee in the development of algorithms based on deep ReLU FNNs and discuss various applications that are motivated by an accurate matrix-vector product approximation with deep Re LU FNNS.

Quasi-Equivalence of Width and Depth of Neural Networks

This work establishes a quasi-equivalence between wide and deep classification ReLU networks, a data-driven version of the DeMorgan law, so that an essentially same capability of the original network can be implemented.



Deep Residual Learning for Image Recognition

This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

ResNet with one-neuron hidden layers is a Universal Approximator

We demonstrate that a very deep ResNet with stacked modules with one neuron per hidden layer and ReLU activation functions can uniformly approximate any Lebesgue integrable function in $d$

FractalNet: Ultra-Deep Neural Networks without Residuals

In experiments, fractal networks match the excellent performance of standard residual networks on both CIFAR and ImageNet classification tasks, thereby demonstrating that residual representations may not be fundamental to the success of extremely deep convolutional neural networks.

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size

This work proposes a small DNN architecture called SqueezeNet, which achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters and is able to compress to less than 0.5MB (510x smaller than AlexNet).

Very Deep Convolutional Networks for Large-Scale Image Recognition

This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

Network In Network

With enhanced local modeling via the micro network, the proposed deep network structure NIN is able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.

LambdaNetworks: Modeling Long-Range Interactions Without Attention

The resulting neural network architectures, LambdaNetworks, significantly outperform their convolutional and attentional counterparts on ImageNet classification, COCO object detection and instance segmentation, while being more computationally efficient.

Training data-efficient image transformers & distillation through attention

This work produces a competitive convolution-free transformer by training on Imagenet only, and introduces a teacher-student strategy specific to transformers that relies on a distillation token ensuring that the student learns from the teacher through attention.

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification

The approach, Latent-Posterior BNN (LP-BNN), is compatible with the recent BatchEnsemble method, leading to highly efficient ensembles, attain competitive results across multiple metrics in several challenging benchmarks for image classification, semantic segmentation and out-of-distribution detection.

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.