• Corpus ID: 1474026

Breaking the Curse of Dimensionality with Convex Neural Networks

  title={Breaking the Curse of Dimensionality with Convex Neural Networks},
  author={Francis R. Bach},
  journal={J. Mach. Learn. Res.},
  • F. Bach
  • Published 30 December 2014
  • Computer Science
  • J. Mach. Learn. Res.
We consider neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units. By letting the number of hidden units grow unbounded and using classical non-Euclidean regularization tools on the output weights, we provide a detailed theoretical analysis of their generalization performance, with a study of both the approximation and the estimation errors. We show in particular that they are adaptive to unknown underlying linear… 

Figures and Tables from this paper

The Curious Case of Convex Neural Networks
Although heavily constrained, IOC-NNs outperform the base multi layer perceptrons and achieve similar performance as compared to base convolutional architectures and show robustness to noise in train labels.
The Curious Case of Convex Networks
The convexity constraints can be enforced on both fully connected and convolutional layers, making them applicable to most architectures, and the ensemble of convex networks can match or outperform the non convex counterparts.
Convex Geometry and Duality of Over-parameterized Neural Networks
A convex analytic framework for ReLU neural networks is developed which elucidates the inner workings of hidden neurons and their function space characteristics and establishes a connection to $\ell_0$-$\ell_1$ equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing.
Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys
Focusing on a class of two-layer neural networks defined by smooth activation functions, it is proved that as soon as the hidden layer size matches the intrinsic dimension of the reproducing space, defined as the linear functional space generated by the activations, no spurious valleys exist, thus allowing the existence of descent directions.
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss
It is shown that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a max-margin classifier in a certain non-Hilbertian space of functions.
Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification
This manuscript proposes fitting a neural network with a sparse group lasso penalty on the first-layer input weights, which results in a neural net that only uses a small subset of the original features, and characterize the statistical convergence of the penalized empirical risk minimizer to the optimal neural network.
Convex Geometry of Two-Layer ReLU Networks: Implicit Autoencoding and Interpretable Models
A convex analytic framework for ReLU neural networks is developed which elucidates the inner workings of hidden neurons and their function space characteristics and establishes a connection to `0-`1 equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing.
Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay
It is proved that for a family of non-smooth activation functions, including ReLU, approximating any single neuron with random features suffers from the curse of dimensionality, providing an explicit separation of expressiveness between neural networks and random feature models.
Analysis of a Two-Layer Neural Network via Displacement Convexity
It is proved that, in the limit in which the number of neurons diverges, the evolution of gradient descent converges to a Wasserstein gradient flow in the space of probability distributions over $\Omega$, which exhibits a special property known as displacement convexity.
Revealing the Structure of Deep Neural Networks via Convex Duality
It is shown that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set and it is proved that each optimal weight matrix is rank-$K$ and aligns with the previous layers via duality.


Continuous Neural Networks
In the first approach proposed, a finite parametrization is possible, allowing gradient-based learning and a kernel machine can be made hyperparameter-free and still generalizes in spite of an absence of explicit regularization.
On the Number of Linear Regions of Deep Neural Networks
We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep
Efficient agnostic learning of neural networks with bounded fan-in
We show that the class of two-layer neural networks with bounded fan-in is efficiently learnable in a realistic extension to the probably approximately correct (PAC) learning model. In this model, a
Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
The extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.
Universal approximation bounds for superpositions of a sigmoidal function
  • A. Barron
  • Computer Science
    IEEE Trans. Inf. Theory
  • 1993
The approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings and the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption.
Convex Neural Networks
Training multi-layer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem, which involves an infinite number of variables but can be solved by incrementally inserting a hidden unit at a time.
On the near optimality of the stochastic approximation of smooth functions by neural networks
This work considers the problem of approximating the Sobolev class of functions by neural networks with a single hidden layer with a probabilistic approach, based on the Radon and wavelet transforms, and establishes both upper and lower bounds.
Consistency of the group Lasso and multiple kernel learning
  • F. Bach
  • Computer Science, Mathematics
    J. Mach. Learn. Res.
  • 2008
This paper derives necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, and proposes an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.
Convexity, Classification, and Risk Bounds
A general quantitative relationship between the risk as assessed using the 0–1 loss and the riskAs assessed using any nonnegative surrogate loss function is provided, and it is shown that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function.