# Robust and Resource-Efficient Identification of Two Hidden Layer Neural Networks

@article{Fornasier2019RobustAR,
title={Robust and Resource-Efficient Identification of Two Hidden Layer Neural Networks},
author={Massimo Fornasier and Timo Klock and Michael Rauchensteiner},
journal={Constructive Approximation},
year={2019},
volume={55},
pages={475-536}
}
• Published 30 June 2019
• Computer Science
• Constructive Approximation
We address the structure identification and the uniform approximation of two fully nonlinear layer neural networks of the type $$f(x)=1^T h(B^T g(A^T x))$$ f ( x ) = 1 T h ( B T g ( A T x ) ) on $$\mathbb R^d$$ R d , where $$g=(g_1,\dots , g_{m_0})$$ g = ( g 1 , ⋯ , g m 0 ) , $$h=(h_1,\dots , h_{m_1})$$ h = ( h 1 , ⋯ , h m 1 ) , $$A=(a_1|\dots |a_{m_0}) \in \mathbb R^{d \times m_0}$$ A = ( a 1 | ⋯ | a m 0 ) ∈ R d × m 0 and $$B=(b_1|\dots |b_{m_1}) \in \mathbb R^{m_0 \times m_1}$$ B = ( b 1…
12 Citations
• Computer Science
Information and Inference: A Journal of the IMA
• 2021
This paper addresses the structure identification and the uniform approximation of sums of ridge functions on a general form of a shallow feed-forward neural network, from a small number of query samples, and proves the successful identification by this program of weight vectors being close to orthonormal.
• Mathematics, Computer Science
Constructive Approximation
• 2022
An embedding for ReLU neural networks of any depth, Φ ( θ ), is introduced that is invariant to scalings and that provides a locally linear parameterization of the realization of the network.
• Mathematics, Computer Science
• 2021
• Computer Science
ArXiv
• 2022
This paper provides constructive methods and theoretical guarantees of the identiﬁcation of two-layer networks with a number of neurons m = O ( D ) , D being the input dimension by identifying the signs by suitable algebraic evaluations, and recovering the biases by empirical risk minimization via gradient descent.
• Computer Science
NeurIPS
• 2021
The focus of this work is function approximation with two-layer neural networks (considering both ReLU and polynomial activation functions), where the results significantly improve upon what can be attained with linear (or eluder dimension) methods.
• Computer Science
ArXiv
• 2021
This work presents a polynomial-time algorithm that can learn a depth two ReLU network from queries under mild general position assumptions, and presents a Poole's inequality test that shows this algorithm can learn most networks where the number ofst layer neurons is smaller than the dimension and theNumber of second layer neurons.
• Computer Science
• 2021
It is proved that any N × N matrix having the so-called buttery structure admits an essentially unique factorization into J butterﬂy factors, and that the factors can be recovered by a hierarchical factorization method, which consists in recursively factorizing the considered matrix into two factors.
• Computer Science
NeurIPS
• 2021
This work derives quantitative bounds such that any second-order critical point with SPM objective value exceeding the bound must equal a tensor component in the noiseless case, and must approximate a Tensor componentIn the noisy case, implying that SPM with suitable initialization is a provable, efficient, robust algorithm for low-rank symmetric tensor decomposition.
• Computer Science, Mathematics
ArXiv
• 2022
Two main theorems on symmetric tensor rank with ε -room of tolerance are proved, which are based on some techniques in geometric functional analysis and rigorous complexity estimates.

## References

SHOWING 1-10 OF 72 REFERENCES

• Computer Science
Information and Inference: A Journal of the IMA
• 2021
This paper addresses the structure identification and the uniform approximation of sums of ridge functions on a general form of a shallow feed-forward neural network, from a small number of query samples, and proves the successful identification by this program of weight vectors being close to orthonormal.
We investigate the efficiency of approximation by linear combinations of ridge functions in the metric of L2 (Bd ) with Bd the unit ball in Rd . If Xn is an n-dimensional linear space of univariate
• Computer Science, Mathematics
COLT
• 2019
This paper shows that the number of hidden units only needs to be larger than a quantity dependent on the regularity properties of the data, and independent of the dimensions, and generalizes this analysis to the case of unbounded activation functions.
• Computer Science
ArXiv
• 2018
A Law of Large Numbers and a Central Limit Theorem for the empirical distribution are established, which together show that the approximation error of the network universally scales as O(n-1) and the scale and nature of the noise introduced by stochastic gradient descent are quantified.
• F. Bach
• Computer Science
J. Mach. Learn. Res.
• 2017
This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.
• Computer Science
Commun. ACM
• 2012
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
This paper considers mainly approximation by ridge functions. Fix a point a 2 IR n and a function g : IR ! IR. Then the function f : IR n ! IR deened by f (x) = g(ax), x 2 IR n , is a ridge or plane
• Computer Science
IEEE Transactions on Information Theory
• 2021
Deep networks provide exponential approximation accuracy—i.e., the approximation error decays exponentially in the number of nonzero weights in the network— of the multiplication operation, polynomials, sinusoidal functions, and certain smooth functions.
• Computer Science
• 1999
The authors explain the role of scale-sensitive versions of the Vapnik Chervonenkis dimension in large margin classification, and in real prediction, and discuss the computational complexity of neural network learning.