Statistical Guarantees for Regularized Neural Networks

  title={Statistical Guarantees for Regularized Neural Networks},
  author={Mahsa Taheri and Fang Xie and Johannes Lederer},
  journal={Neural networks : the official journal of the International Neural Network Society},

Risk Bounds for Robust Deep Learning

This paper shows that empirical-risk minimization with unbounded, Lipschitz-continuous loss functions, such as the least-absolute deviation loss, Huber loss, Cauchy loss, and Tukey's biweight loss, can provide efficient prediction under minimal assumptions on the data.

Hierarchical Adaptive Lasso: Learning Sparse Neural Networks with Shrinkage via Single Stage Training

A novel penalty called Hierarchical Adaptive Lasso (HALO) which learns to adaptively sparsify weights of a given network via trainable parameters without learning a mask is presented.

No Spurious Local Minima: on the Optimization Landscapes of Wide and Deep Neural Networks

These theories substantiate the common belief that increasing network widths not only improves the expressiveness of deep-learning pipelines but also facilitates their optimizations, and prove especially that constraint and unconstraint empirical-risk minimization over such networks has no spurious local minima.

Statistical guarantees for sparse deep learning

The guarantees have a mild dependence on network widths and depths, which means that they support the application of sparse but wide and deep networks from a statistical perspective.

Deep neural network approximation of analytic functions

An oracle inequality for the expected error of the considered penalized deep neural network estimators is derived from ε-approximate functions that are analytic on certain regions of C.

Analytic function approximation by path norm regularized deep networks

An entropy bound is provided for the spaces of path norm regularized neural networks with piecewise linear activation functions, such as the ReLU and the absolute value functions that are analytic on certain regions of C.

Reducing Computational and Statistical Complexity in Machine Learning Through Cardinality Sparsity

It is shown that cardinality sparsity can improve deep learning and tensor regression both statistically and computationally, and generalize recent statistical theories in those fields.

Statistical Guarantees for Approximate Stationary Points of Simple Neural Networks

Simple neural networks are developed that coincide up to logarithmic factors with the global optima but apply to stationary points and the points nearby, supporting the common notion that neural networks do not necessarily need to be optimized globally from a mathematical perspective.

Function Approximation by Deep Neural Networks with Parameters {0,±12,±1,2}\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{docume

  • A. Beknazaryan
  • Materials Science
    Journal of Statistical Theory and Practice
  • 2022
In this paper, it is shown that Cβ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs}

Distribution Estimation of Contaminated Data via DNN-based MoM-GANs

The numerical results show that the MoM-GAN outperforms other competitive methods when dealing with contaminated data, and derives a non-asymptotic error bound for the DNN-based Mo M-GAN estimator measured by integral probability metrics with the b -smoothness H ¨ older class.



Approximation and Estimation for High-Dimensional Deep Learning Networks

The heart of the analysis is the development of a sampling strategy that demonstrates the accuracy of a sparse covering of deep ramp networks, and lower bounds show that the identified risk is close to being optimal.

Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification

This manuscript proposes fitting a neural network with a sparse group lasso penalty on the first-layer input weights, which results in a neural net that only uses a small subset of the original features, and characterize the statistical convergence of the penalized empirical risk minimizer to the optimal neural network.

L1-regularized Neural Networks are Improperly Learnable in Polynomial Time

A kernel-based method, such that with probability at least 1 - δ, it learns a predictor whose generalization error is at most e worse than that of the neural network, implies that any sufficiently sparse neural network is learnable in polynomial time.

On the rate of convergence of fully connected very deep neural network regression estimates

This paper shows that it is possible to get similar results also for least squares estimates based on simple fully connected neural networks with ReLU activation functions, based on new approximation results concerning deep neural networks.

High-Dimensional Learning under Approximate Sparsity: A Unifying Framework for Nonsmooth Learning and Regularized Neural Networks

High-dimensional statistical learning (HDSL) has been widely applied in data analysis, operations research, and stochastic optimization. Despite the availability of multiple theoretical frameworks,

Group sparse regularization for deep neural networks

Norm-based generalisation bounds for multi-class convolutional neural networks

These bounds have no explicit dependence on the number of classes except for logarithmic factors and are asymptotically tight when the weights approach initialisation, making them suitable as a basic ingredient in bounds sensitive to the optimisation procedure.

Neural Network Learning - Theoretical Foundations

The authors explain the role of scale-sensitive versions of the Vapnik Chervonenkis dimension in large margin classification, and in real prediction, and discuss the computational complexity of neural network learning.

Implicit Regularization in Deep Learning

It is shown that implicit regularization induced by the optimization method is playing a key role in generalization and success of deep learning models, and how different complexity measures can ensure generalization is studied to explain different observed phenomena in deep learning.

High-Dimensional Probability: An Introduction with Applications in Data Science

© 2018, Cambridge University Press Let us summarize our findings. A random projection of a set T in R n onto an m-dimensional subspace approximately preserves the geometry of T if m ⪆ d ( T ) . For...