# Statistical Guarantees for Regularized Neural Networks

@article{Taheri2021StatisticalGF, title={Statistical Guarantees for Regularized Neural Networks}, author={Mahsa Taheri and Fang Xie and Johannes Lederer}, journal={Neural networks : the official journal of the International Neural Network Society}, year={2021}, volume={142}, pages={ 148-161 } }

Neural networks have become standard tools in the analysis of data, but they lack comprehensive mathematical theories. For example, there are very few statistical guarantees for learning neural networks from data, especially for classes of estimators that are used in practice or at least similar to such. In this paper, we develop a general statistical guarantee for estimators that consist of a least-squares term and a regularizer. We then exemplify this guarantee with ℓ1-regularization, showing… Expand

#### Topics from this paper

#### 10 Citations

Risk Bounds for Robust Deep Learning

- Computer Science, Mathematics
- ArXiv
- 2020

This paper shows that empirical-risk minimization with unbounded, Lipschitz-continuous loss functions, such as the least-absolute deviation loss, Huber loss, Cauchy loss, and Tukey's biweight loss, can provide efficient prediction under minimal assumptions on the data. Expand

Hierarchical Adaptive Lasso: Learning Sparse Neural Networks with Shrinkage via Single Stage Training

- Computer Science, Mathematics
- ArXiv
- 2020

A novel penalty called Hierarchical Adaptive Lasso (HALO) which learns to adaptively sparsify weights of a given network via trainable parameters without learning a mask is presented. Expand

No Spurious Local Minima: on the Optimization Landscapes of Wide and Deep Neural Networks

- Computer Science, Mathematics
- ArXiv
- 2020

These theories substantiate the common belief that increasing network widths not only improves the expressiveness of deep-learning pipelines but also facilitates their optimizations, and prove especially that constraint and unconstraint empirical-risk minimization over such networks has no spurious local minima. Expand

Analytic function approximation by path norm regularized deep networks

- Mathematics, Computer Science
- 2021

An entropy bound is provided for the spaces of path norm regularized neural networks with piecewise linear activation functions, such as the ReLU and the absolute value functions that are analytic on certain regions of C. Expand

Deep neural network approximation of analytic functions

- Computer Science
- ArXiv
- 2021

An oracle inequality for the expected error of the considered penalized deep neural network estimators is derived from ε-approximate functions that are analytic on certain regions of C. Expand

Function approximation by deep neural networks with parameters $\{0,\pm \frac{1}{2}, \pm 1, 2\}$

- Mathematics, Computer Science
- 2021

It is shown that C_\beta-smooth functions can be approximated by neural networks with parameters and the nonparametric regression estimation with constructed networks attain the same convergence rate as with the sparse networks withParameters. Expand

HALO: Learning to Prune Neural Networks with Shrinkage

- Computer Science
- SDM
- 2021

A novel penalty called Hierarchical Adaptive Lasso (HALO) which learns to adaptively sparsify weights of a given network via trainable parameters is presented which is able to learn highly sparse network with significant gains in performance over state-of-the-art magnitude pruning methods at the same level of sparsity. Expand

Neural networks with superexpressive activations and integer weights

- Computer Science, Mathematics
- ArXiv
- 2021

The range of integer weights required for ε-approximation of Hölder continuous functions is derived, which leads to a convergence rate of order n −2β 2β+d log 2 n for neural network regression estimation of unknown β-Hölder continuously function with given n samples. Expand

Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks

- Computer Science, Mathematics
- ArXiv
- 2021

This paper revisits the vanishing-gradient problem in the context of sigmoid-type activation and uses mathematical arguments to highlight two different sources of the phenomenon, namely large individual parameters and effects across layers, and to illustrate two simple remedies, namely regularization and rescaling. Expand

Layer Sparsity in Neural Networks

- Computer Science, Mathematics
- ArXiv
- 2020

A new notion of sparsity is formulated that concerns the networks' layers and, therefore, aligns particularly well with the current trend toward deep networks, and is called layer sparsity. Expand

#### References

SHOWING 1-10 OF 54 REFERENCES

Approximation and Estimation for High-Dimensional Deep Learning Networks

- Computer Science, Mathematics
- ArXiv
- 2018

The heart of the analysis is the development of a sampling strategy that demonstrates the accuracy of a sparse covering of deep ramp networks, and lower bounds show that the identified risk is close to being optimal. Expand

Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification

- Mathematics
- 2017

Neural networks are usually not the tool of choice for nonparametric high-dimensional problems where the number of input features is much larger than the number of observations. Though neural… Expand

L1-regularized Neural Networks are Improperly Learnable in Polynomial Time

- Mathematics, Computer Science
- ICML
- 2016

A kernel-based method, such that with probability at least 1 - δ, it learns a predictor whose generalization error is at most e worse than that of the neural network, implies that any sufficiently sparse neural network is learnable in polynomial time. Expand

On the rate of convergence of fully connected very deep neural network regression estimates

- Mathematics, Computer Science
- ArXiv
- 2019

This paper shows that it is possible to get similar results also for least squares estimates based on simple fully connected neural networks with ReLU activation functions, based on new approximation results concerning deep neural networks. Expand

High-Dimensional Learning under Approximate Sparsity: A Unifying Framework for Nonsmooth Learning and Regularized Neural Networks

- Mathematics
- 2019

High-dimensional statistical learning (HDSL) has been widely applied in data analysis, operations research, and stochastic optimization. Despite the availability of multiple theoretical frameworks,… Expand

Group sparse regularization for deep neural networks

- Computer Science, Mathematics
- Neurocomputing
- 2017

The group Lasso penalty is extended, originally proposed in the linear regression literature, to impose group-level sparsity on the networks connections, where each group is defined as the set of outgoing weights from a unit. Expand

Nonparametric regression using deep neural networks with ReLU activation function

- Mathematics
- 2020

Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network… Expand

Neural Network Learning - Theoretical Foundations

- Computer Science
- 1999

The authors explain the role of scale-sensitive versions of the Vapnik Chervonenkis dimension in large margin classification, and in real prediction, and discuss the computational complexity of neural network learning. Expand

Implicit Regularization in Deep Learning

- Mathematics, Computer Science
- ArXiv
- 2017

It is shown that implicit regularization induced by the optimization method is playing a key role in generalization and success of deep learning models, and how different complexity measures can ensure generalization is studied to explain different observed phenomena in deep learning. Expand

Complexity, Statistical Risk, and Metric Entropy of Deep Nets Using Total Path Variation

- Mathematics, Computer Science
- ArXiv
- 2019

For any ReLU network there is a representation in which the sum of the absolute values of the weights into each node is exactly $1$, and the input layer variables are multiplied by a value $V$… Expand