Corpus ID: 235458480

On a Sparse Shortcut Topology of Artificial Neural Networks

  title={On a Sparse Shortcut Topology of Artificial Neural Networks},
  author={Fenglei Fan and Dayang Wang and Hengtao Guo and Qikui Zhu and Pingkun Yan and Ge Wang and Hengyong Yu},
Over recent years, deep learning has become the mainstream data-driven approach to solve many important real-world problems. In the successful network architectures, shortcut connections are well established to take the outputs of earlier layers as additional inputs to later layers, which have produced excellent results. Despite the extraordinary effectiveness of shortcuts, there remain important questions on the underlying mechanism and associated functionalities. For example, why are… Expand
Neural Network Gaussian Processes by Increasing Depth
Recent years have witnessed an increasing interest in the correspondence between infinitely wide networks and Gaussian processes. Despite the effectiveness and elegance of the current neural networkExpand


On Exact Computation with an Infinitely Wide Neural Net
The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which it is called Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm. Expand
Universal Approximation with Quadratic Deep Networks
The main contributions are the four interconnected theorems shedding light upon four questions and demonstrating the merits of a quadratic network in terms of expressive efficiency, unique capability, compact architecture and computational capacity. Expand
The Expressive Power of Neural Networks: A View from the Width
It is shown that there exist classes of wide networks which cannot be realized by any narrow network whose depth is no more than a polynomial bound, and that narrow networks whose size exceed the polynometric bound by a constant factor can approximate wide and shallow network with high accuracy. Expand
Why ResNet Works? Residuals Generalize
According to the obtained generalization bound, regularization terms should be introduced to control the magnitude of the norms of weight matrices not to increase too much, in practice, to ensure a good generalization ability, which justifies the technique of weight decay. Expand
Sparsely Aggregated Convolutional Networks
This work proposes a new internal connection structure which aggregates only a sparse set of previous outputs at any given depth, and shows that sparse aggregation allows networks to scale more robustly to 1000+ layers, thereby opening future avenues for training long-running visual processes. Expand
Towards Understanding the Importance of Shortcut Connections in Residual Networks
It is shown that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialize arbitrarily in a ball. Expand
Deep Networks with Stochastic Depth
Stochastic depth is proposed, a training procedure that enables the seemingly contradictory setup to train short networks and use deep networks at test time and reduces training time substantially and improves the test error significantly on almost all data sets that were used for evaluation. Expand
Deep vs. shallow networks : An approximation theory perspective
A new definition of relative dimension is proposed to encapsulate different notions of sparsity of a function class that can possibly be exploited by deep networks but not by shallow ones to drastically reduce the complexity required for approximation and learning. Expand
SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data
This work proves convergence rates of SGD to a global minimum and provides generalization guarantees for this global minimum that are independent of the network size, and shows that SGD can avoid overfitting despite the high capacity of the model. Expand
Why Deep Neural Networks for Function Approximation?
It is shown that, for a large class of piecewise smooth functions, the number of neurons needed by a shallow network to approximate a function is exponentially larger than the corresponding number of neuron needs by a deep network for a given degree of function approximation. Expand