• Corpus ID: 232427897

Using activation histograms to bound the number of affine regions in ReLU feed-forward neural networks

  title={Using activation histograms to bound the number of affine regions in ReLU feed-forward neural networks},
  author={Peter Hinz},
  • Peter Hinz
  • Published 31 March 2021
  • Computer Science, Mathematics
  • ArXiv
Several current bounds on the maximal number of affine regions of a ReLU feed-forward neural network are special cases of the framework [1] which relies on layer-wise activation histogram bounds. We analyze and partially solve a problem in algebraic topology the solution of which would fully exploit this framework. Our partial solution already induces slightly tighter bounds and suggests insight in how parameter initialization methods can affect the number of regions. Furthermore, we extend the… 
On the Expected Complexity of Maxout Networks
This work shows that the practical complexity of deep ReLU networks is often far from the theoretical maximum, and shows that this phenomenon also occurs in networks with maxout (multi-argument) activation functions and when considering the decision boundaries in classification tasks.


Bounding and Counting Linear Regions of Deep Neural Networks
The results indicate that a deep rectifier network can only have more linear regions than every shallow counterpart with same number of neurons if that number exceeds the dimension of the input.
A Framework for the Construction of Upper Bounds on the Number of Affine Linear Regions of ReLU Feed-Forward Neural Networks
By using explicit formulas for a Jordan-like decomposition of the involved matrices, the framework to derive upper bounds on the number of regions that feed-forward neural networks with ReLU activation functions are affine linear on is presented.
A General Computational Framework to Measure the Expressiveness of Complex Networks Using a Tighter Upper Bound of Linear Regions
This work proposes ageneral computational approach to compute a tight upper bound of regions number for theoretically any network structures (e.g. DNN with all kind of skip connec-tions and residual structures).
Empirical Studies on the Properties of Linear Regions in Deep Neural Networks
Instead of just counting the number of the linear regions, this paper studies their local properties, such as the inspheres, the directions of the corresponding hyperplanes, the decision boundaries, and the relevance of the surrounding regions.
On the Number of Linear Regions of Convolutional Neural Networks
This paper provides several mathematical results needed for studying the linear regions of CNNs, and uses them to derive the maximal and average numbers of linear regions for one-layer and multi-layer ReLU CNNs.
Complexity of Linear Regions in Deep Networks
The theory suggests that, even after training, the number of linear regions is far below exponential, an intuition that matches the empirical observations and concludes that the practical expressivity of neural networks is likely far below that of the theoretical maximum, and this gap can be quantified.
Deep Neural Network Initialization With Decision Trees
By combining the user-friendly features of decision tree models with the flexibility and scalability of deep neural networks, DJINN is an attractive algorithm for training predictive models on a wide range of complex data sets.
Deep ReLU Networks Have Surprisingly Few Activation Patterns
This work shows empirically that the average number of activation patterns for ReLU networks at initialization is bounded by the total number of neurons raised to the input dimension, and suggests that realizing the full expressivity of deep networks may not be possible in practice, at least with current methods.
Dying ReLU and Initialization: Theory and Numerical Examples
This paper rigorously proves that a deep ReLU network will eventually die in probability as the depth goes to infinite, and proposes a new initialization procedure, namely, a randomized asymmetric initialization, which can effectively prevent the dying ReLU.
Gradient descent optimizes over-parameterized deep ReLU networks
The key idea of the proof is that Gaussian random initialization followed by gradient descent produces a sequence of iterates that stay inside a small perturbation region centered at the initial weights, in which the training loss function of the deep ReLU networks enjoys nice local curvature properties that ensure the global convergence of gradient descent.