# On a Sparse Shortcut Topology of Artificial Neural Networks

@inproceedings{Fan2018OnAS, title={On a Sparse Shortcut Topology of Artificial Neural Networks}, author={Fenglei Fan and Dayang Wang and Hengtao Guo and Qikui Zhu and Pingkun Yan and Ge Wang and Hengyong Yu}, year={2018} }

Over recent years, deep learning has become the mainstream data-driven approach to solve many important real-world problems. In the successful network architectures, shortcut connections are well established to take the outputs of earlier layers as additional inputs to later layers, which have produced excellent results. Despite the extraordinary effectiveness of shortcuts, there remain important questions on the underlying mechanism and associated functionalities. For example, why are… Expand

#### Figures and Tables from this paper

#### One Citation

Neural Network Gaussian Processes by Increasing Depth

- Computer Science
- ArXiv
- 2021

Recent years have witnessed an increasing interest in the correspondence between infinitely wide networks and Gaussian processes. Despite the effectiveness and elegance of the current neural network… Expand

#### References

SHOWING 1-10 OF 81 REFERENCES

On Exact Computation with an Infinitely Wide Neural Net

- Computer Science, Mathematics
- NeurIPS
- 2019

The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which it is called Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm. Expand

Universal Approximation with Quadratic Deep Networks

- Computer Science, Mathematics
- Neural Networks
- 2020

The main contributions are the four interconnected theorems shedding light upon four questions and demonstrating the merits of a quadratic network in terms of expressive efficiency, unique capability, compact architecture and computational capacity. Expand

The Expressive Power of Neural Networks: A View from the Width

- Computer Science, Mathematics
- NIPS
- 2017

It is shown that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a polynomial bound, and that narrow networks whose size exceed the polynometric bound by a constant factor can approximate wide and shallow network with high accuracy. Expand

Why ResNet Works? Residuals Generalize

- Mathematics, Computer Science
- IEEE Transactions on Neural Networks and Learning Systems
- 2020

According to the obtained generalization bound, regularization terms should be introduced to control the magnitude of the norms of weight matrices not to increase too much, in practice, to ensure a good generalization ability, which justifies the technique of weight decay. Expand

Sparsely Aggregated Convolutional Networks

- Computer Science
- ECCV
- 2018

This work proposes a new internal connection structure which aggregates only a sparse set of previous outputs at any given depth, and shows that sparse aggregation allows networks to scale more robustly to 1000+ layers, thereby opening future avenues for training long-running visual processes. Expand

Towards Understanding the Importance of Shortcut Connections in Residual Networks

- Computer Science, Mathematics
- NeurIPS
- 2019

It is shown that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialize arbitrarily in a ball. Expand

Deep Networks with Stochastic Depth

- Computer Science
- ECCV
- 2016

Stochastic depth is proposed, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time and reduces training time substantially and improves the test error significantly on almost all data sets that were used for evaluation. Expand

Deep vs. shallow networks : An approximation theory perspective

- Computer Science, Mathematics
- ArXiv
- 2016

A new definition of relative dimension is proposed to encapsulate different notions of sparsity of a function class that can possibly be exploited by deep networks but not by shallow ones to drastically reduce the complexity required for approximation and learning. Expand

SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data

- Computer Science, Mathematics
- ICLR
- 2018

This work proves convergence rates of SGD to a global minimum and provides generalization guarantees for this global minimum that are independent of the network size, and shows that SGD can avoid overfitting despite the high capacity of the model. Expand

Why Deep Neural Networks for Function Approximation?

- Computer Science
- ICLR
- 2017

It is shown that, for a large class of piecewise smooth functions, the number of neurons needed by a shallow network to approximate a function is exponentially larger than the corresponding number of neuron needs by a deep network for a given degree of function approximation. Expand