• Corpus ID: 238419231

Tighter Sparse Approximation Bounds for ReLU Neural Networks

  title={Tighter Sparse Approximation Bounds for ReLU Neural Networks},
  author={Carles Domingo-Enrich and Youssef Mroueh},
A well-known line of work [Barron, 1993, Breiman, 1993, Klusowski and Barron, 2018] provides bounds on the width n of a ReLU two-layer neural network needed to approximate a function f over the ball BR(R) up to error , when the Fourier based quantity Cf = 1 (2π)d/2 ∫ Rd ‖ξ‖ |f̂(ξ)| dξ is finite. More recently Ongie et al. [2019] used the Radon transform as a tool for analysis of infinite-width ReLU two-layer networks. In particular, they introduce the concept of Radon-based R-norms and show… 

Lower Bounds for the MMSE via Neural Network Estimation and Their Applications to Privacy

This paper establishes provable lower bounds for the minimum mean-square error (MMSE) based on a two-layer neural network estimator of the MMSE and the Barron constant of an appropriate function of the conditional expectation of Y given X .



A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case

This paper characterize the norm required to realize a function as a single hidden-layer ReLU network with an unbounded number of units, but where the Euclidean norm of the weights is bounded, including precisely characterizing which functions can be realized with finite norm.

How do infinite width bounded norm networks look in function space?

The question of what functions can be captured by ReLU networks with an unbounded number of units, but where the overall network Euclidean norm is bounded is considered; or equivalently what is the minimal norm required to approximate a given function.

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

It is proved that for infinite-width two-layer nets, noisy gradient descent optimizes the regularized neural net loss to a global minimum in polynomial iterations.

Breaking the Curse of Dimensionality with Convex Neural Networks

  • F. Bach
  • Computer Science
    J. Mach. Learn. Res.
  • 2017
This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.

What Kinds of Functions do Deep Neural Networks Learn? Insights from Variational Spline Theory

A new function space, which is reminiscent of classical bounded variation spaces, that captures the compositional structure associated with deep neural networks is proposed, and a representer theorem is derived showing that deep ReLU networks are solutions to regularized data fitting problems in this function space.

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

The implicit regularization of the gradient descent algorithm in homogeneous neural networks, including fully-connected and convolutional neural networks with ReLU or LeakyReLU activations, is studied, and it is proved that both the normalized margin and its smoothed version converge to the objective value at a KKT point of the optimization problem.

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

It is shown that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers and involves Wasserstein gradient flows, a by-product of optimal transport theory.

On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization

This work characterizes the generalization ability of algorithms whose predictions are linear in the input vector. To this end, we provide sharp bounds for Rademacher and Gaussian complexities of

Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error

A Law of Large Numbers and a Central Limit Theorem for the empirical distribution are established, which together show that the approximation error of the network universally scales as O(n-1) and the scale and nature of the noise introduced by stochastic gradient descent are quantified.

Understanding neural networks with reproducing kernel Banach spaces