# Breaking the Curse of Dimensionality with Convex Neural Networks

@article{Bach2017BreakingTC, title={Breaking the Curse of Dimensionality with Convex Neural Networks}, author={Francis R. Bach}, journal={J. Mach. Learn. Res.}, year={2017}, volume={18}, pages={19:1-19:53} }

We consider neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units. By letting the number of hidden units grow unbounded and using classical non-Euclidean regularization tools on the output weights, we provide a detailed theoretical analysis of their generalization performance, with a study of both the approximation and the estimation errors. We show in particular that they are adaptive to unknown underlying linear…

## 474 Citations

The Curious Case of Convex Neural Networks

- Computer ScienceECML/PKDD
- 2021

Although heavily constrained, IOC-NNs outperform the base multi layer perceptrons and achieve similar performance as compared to base convolutional architectures and show robustness to noise in train labels.

The Curious Case of Convex Networks

- Computer ScienceArXiv
- 2020

The convexity constraints can be enforced on both fully connected and convolutional layers, making them applicable to most architectures, and the ensemble of convex networks can match or outperform the non convex counterparts.

Convex Geometry and Duality of Over-parameterized Neural Networks

- Computer ScienceJ. Mach. Learn. Res.
- 2021

A convex analytic framework for ReLU neural networks is developed which elucidates the inner workings of hidden neurons and their function space characteristics and establishes a connection to $\ell_0$-$\ell_1$ equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing.

Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys

- Computer ScienceArXiv
- 2018

Focusing on a class of two-layer neural networks defined by smooth activation functions, it is proved that as soon as the hidden layer size matches the intrinsic dimension of the reproducing space, defined as the linear functional space generated by the activations, no spurious valleys exist, thus allowing the existence of descent directions.

Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss

- Computer ScienceCOLT
- 2020

It is shown that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a max-margin classifier in a certain non-Hilbertian space of functions.

Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification

- Computer Science
- 2017

This manuscript proposes fitting a neural network with a sparse group lasso penalty on the first-layer input weights, which results in a neural net that only uses a small subset of the original features, and characterize the statistical convergence of the penalized empirical risk minimizer to the optimal neural network.

Convex Geometry of Two-Layer ReLU Networks: Implicit Autoencoding and Interpretable Models

- Computer ScienceAISTATS
- 2020

A convex analytic framework for ReLU neural networks is developed which elucidates the inner workings of hidden neurons and their function space characteristics and establishes a connection to `0-`1 equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing.

Linear approximability of two-layer neural networks: A comprehensive analysis based on spectral decay

- Computer ScienceArXiv
- 2021

It is proved that for a family of non-smooth activation functions, including ReLU, approximating any single neuron with random features suffers from the curse of dimensionality, providing an explicit separation of expressiveness between neural networks and random feature models.

Analysis of a Two-Layer Neural Network via Displacement Convexity

- Computer Science, MathematicsThe Annals of Statistics
- 2020

It is proved that, in the limit in which the number of neurons diverges, the evolution of gradient descent converges to a Wasserstein gradient flow in the space of probability distributions over $\Omega$, which exhibits a special property known as displacement convexity.

Revealing the Structure of Deep Neural Networks via Convex Duality

- Computer ScienceICML
- 2021

It is shown that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set and it is proved that each optimal weight matrix is rank-$K$ and aligns with the previous layers via duality.

## References

SHOWING 1-10 OF 114 REFERENCES

Continuous Neural Networks

- Computer ScienceAISTATS
- 2007

In the first approach proposed, a finite parametrization is possible, allowing gradient-based learning and a kernel machine can be made hyperparameter-free and still generalizes in spite of an absence of explicit regularization.

On the Number of Linear Regions of Deep Neural Networks

- Computer ScienceNIPS
- 2014

We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep…

Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

- Computer ScienceNIPS
- 2008

The extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

Universal approximation bounds for superpositions of a sigmoidal function

- Computer ScienceIEEE Trans. Inf. Theory
- 1993

The approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings and the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption.

On the tractability of multivariate integration and approximation by neural networks

- MathematicsJ. Complex.
- 2004

Convex Neural Networks

- Computer ScienceNIPS
- 2005

Training multi-layer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem, which involves an infinite number of variables but can be solved by incrementally inserting a hidden unit at a time.

On the near optimality of the stochastic approximation of smooth functions by neural networks

- Computer ScienceAdv. Comput. Math.
- 2000

This work considers the problem of approximating the Sobolev class of functions by neural networks with a single hidden layer with a probabilistic approach, based on the Radon and wavelet transforms, and establishes both upper and lower bounds.

Consistency of the group Lasso and multiple kernel learning

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2008

This paper derives necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, and proposes an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.

Convexity, Classification, and Risk Bounds

- Computer Science
- 2006

A general quantitative relationship between the risk as assessed using the 0–1 loss and the riskAs assessed using any nonnegative surrogate loss function is provided, and it is shown that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function.

Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces

- Computer ScienceJ. Mach. Learn. Res.
- 2004

A novel method of dimensionality reduction for supervised learning problems that requires neither assumptions on the marginal distribution of X, nor a parametric model of the conditional distribution of Y, and establishes a general nonparametric characterization of conditional independence using covariance operators on reproducing kernel Hilbert spaces.