• Corpus ID: 44137277

Representational Power of ReLU Networks and Polynomial Kernels: Beyond Worst-Case Analysis

  title={Representational Power of ReLU Networks and Polynomial Kernels: Beyond Worst-Case Analysis},
  author={Frederic Koehler and Andrej Risteski},
There has been a large amount of interest, both in the past and particularly recently, into the power of different families of universal approximators, e.g. ReLU networks, polynomials, rational functions. However, current research has focused almost exclusively on understanding this problem in a worst-case setting, e.g. bounding the error of the best infinity-norm approximation in a box. In this setting a high-degree polynomial is required to even approximate a single ReLU. However, in real… 
Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits
This work proves that as the RKHS is data-adaptive and task-specific, the residual for $f_*$ lies in a subspace that is potentially much smaller than the orthogonal complement of theRKHS, which formalizes the representation and approximation benefits of neural networks.
Mad Max: Affine Spline Insights Into Deep Learning
A rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators is built and a simple penalty term is proposed that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other.
Reconstruction on Trees and Low-Degree Polynomials
The results clarify some of the limitations of low-degree polynomials vs. polynomial time algorithms for Bayesian estimation problems and complement recent work of Moitra, Mossel, and Sandon who studied the circuit complexity of Belief Propagation.
Optimization of the Convolutional Neural Networks for Automatic Detection of Skin Cancer
A meta-heuristic optimized CNN classifier is applied for pre-trained network models for visual datasets with the purpose of classifying skin cancer images with better accuracy than other classification methods.
Optimal brain tumor diagnosis based on deep learning and balanced sparrow search algorithm
A tumor is segmented after effectively preprocessing MRI images, and the main features are mined using a combination of the gray‐level cooccurrence matrix and discrete wavelet transform to improve the efficiency of the CNN concerning consistency and accuracy.
Deep learning and optimization algorithms for automatic breast cancer detection
This paper proposes a comprehensive method to locate the cancerous region in the mammogram image that employs image noise reduction, optimal image segmentation based on the convolutional neural network, a grasshopper optimization algorithm, and optimized feature extraction and feature selectionbased on the grasshoppers algorithm, thereby improving precision and decreasing the computational cost.
Computer-aided diagnosis of skin cancer based on soft computing techniques
An automatic computer-aided method for the early diagnosis of skin cancer using the convolutional neural network optimized by satin bowerbird optimization (SBO) has been presented and its efficiency has been indicated by the confusion matrix.
COVID-19 Diagnosis from CT Images with Convolutional Neural Network Optimized by Marine Predator Optimization Algorithm
This study proposes a hybrid method based on convolutional neural network which is optimized by a newly introduced metaheuristic, called marine predator optimization algorithm, which shows its higher accuracy and reliability than the compared methods.


Spectrally-normalized margin bounds for neural networks
This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnist and cifar10 datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and that the presented bound is sensitive to this complexity.
Learning Kernel-Based Halfspaces with the 0-1 Loss
A new algorithm for agnostically learning kernel-based halfspaces with respect to the 0-1 loss function is described and analyzed and proves a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn kernel- based halfspace in time polynomial in $L$.
Neural Networks and Rational Functions
When converting a ReLU network to a rational function, the hidden constants depend exponentially on the number of layers, which is shown to be tight; in other words, a compositional representation can be beneficial even for rational functions.
Train faster, generalize better: Stability of stochastic gradient descent
We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmically
Benefits of Depth in Neural Networks
This result is proved here for a class of nodes termed "semi-algebraic gates" which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with reLU and maximization gates, sum-product networks, and boosted decision trees.
On the Computational Efficiency of Training Neural Networks
This paper revisits the computational complexity of training neural networks from a modern perspective and provides both positive and negative results, some of them yield new provably efficient and practical algorithms for training certain types of neural networks.
Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks
We provide several new depth-based separation results for feed-forward neural networks, proving that various types of simple and natural functions can be better approximated using deeper networks
The Power of Depth for Feedforward Neural Networks
It is shown that there is a simple (approximately radial) function on $\reals^d$, expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, unless its width is exponential in the dimension.
Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review
An emerging body of theoretical results on deep learning including the conditions under which it can be exponentially better than shallow learning are reviewed, together with new results, open problems and conjectures.