Robust and Resource-Efficient Identification of Two Hidden Layer Neural Networks

@article{Fornasier2019RobustAR,
  title={Robust and Resource-Efficient Identification of Two Hidden Layer Neural Networks},
  author={Massimo Fornasier and Timo Klock and Michael Rauchensteiner},
  journal={Constructive Approximation},
  year={2019},
  volume={55},
  pages={475-536}
}
We address the structure identification and the uniform approximation of two fully nonlinear layer neural networks of the type $$f(x)=1^T h(B^T g(A^T x))$$ f ( x ) = 1 T h ( B T g ( A T x ) ) on $$\mathbb R^d$$ R d , where $$g=(g_1,\dots , g_{m_0})$$ g = ( g 1 , ⋯ , g m 0 ) , $$h=(h_1,\dots , h_{m_1})$$ h = ( h 1 , ⋯ , h m 1 ) , $$A=(a_1|\dots |a_{m_0}) \in \mathbb R^{d \times m_0}$$ A = ( a 1 | ⋯ | a m 0 ) ∈ R d × m 0 and $$B=(b_1|\dots |b_{m_1}) \in \mathbb R^{m_0 \times m_1}$$ B = ( b 1… 

Identification of Shallow Neural Networks by Fewest Samples

This paper addresses the structure identification and the uniform approximation of sums of ridge functions on a general form of a shallow feed-forward neural network, from a small number of query samples, and proves the successful identification by this program of weight vectors being close to orthonormal.

An Embedding of ReLU Networks and an Analysis of their Identifiability

An embedding for ReLU neural networks of any depth, Φ ( θ ), is introduced that is invariant to scalings and that provides a locally linear parameterization of the realization of the network.

Affine symmetries and neural network identifiability

Finite Sample Identification of Wide Shallow Neural Networks with Biases

This paper provides constructive methods and theoretical guarantees of the identification of two-layer networks with a number of neurons m = O ( D ) , D being the input dimension by identifying the signs by suitable algebraic evaluations, and recovering the biases by empirical risk minimization via gradient descent.

Going Beyond Linear RL: Sample Efficient Neural Function Approximation

The focus of this work is function approximation with two-layer neural networks (considering both ReLU and polynomial activation functions), where the results significantly improve upon what can be attained with linear (or eluder dimension) methods.

An Exact Poly-Time Membership-Queries Algorithm for Extraction a three-Layer ReLU Network

This work presents a polynomial-time algorithm that can learn a depth two ReLU network from queries under mild general position assumptions, and presents a Poole's inequality test that shows this algorithm can learn most networks where the number ofst layer neurons is smaller than the dimension and theNumber of second layer neurons.

Efficient Identification of Butterfly Sparse Matrix Factorizations

It is proved that any N × N matrix having the so-called buttery structure admits an essentially unique factorization into J butterfly factors, and that the factors can be recovered by a hierarchical factorization method, which consists in recursively factorizing the considered matrix into two factors.

Landscape analysis of an improved power method for tensor decomposition

This work derives quantitative bounds such that any second-order critical point with SPM objective value exceeding the bound must equal a tensor component in the noiseless case, and must approximate a Tensor componentIn the noisy case, implying that SPM with suitable initialization is a provable, efficient, robust algorithm for low-rank symmetric tensor decomposition.

Approximate Low-Rank Decomposition for Real Symmetric Tensors

Two main theorems on symmetric tensor rank with ε -room of tolerance are proved, which are based on some techniques in geometric functional analysis and rigorous complexity estimates.

References

SHOWING 1-10 OF 72 REFERENCES

Identification of Shallow Neural Networks by Fewest Samples

This paper addresses the structure identification and the uniform approximation of sums of ridge functions on a general form of a shallow feed-forward neural network, from a small number of query samples, and proves the successful identification by this program of weight vectors being close to orthonormal.

Approximation by ridge functions and neural networks

We investigate the efficiency of approximation by linear combinations of ridge functions in the metric of L2 (Bd ) with Bd the unit ball in Rd . If Xn is an n-dimensional linear space of univariate

Provable approximation properties for deep neural networks

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit

This paper shows that the number of hidden units only needs to be larger than a quantity dependent on the regularity properties of the data, and independent of the dimensions, and generalizes this analysis to the case of unbounded activation functions.

Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error

A Law of Large Numbers and a Central Limit Theorem for the empirical distribution are established, which together show that the approximation error of the network universally scales as O(n-1) and the scale and nature of the noise introduced by stochastic gradient descent are quantified.

Breaking the Curse of Dimensionality with Convex Neural Networks

  • F. Bach
  • Computer Science
    J. Mach. Learn. Res.
  • 2017
This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.

ImageNet classification with deep convolutional neural networks

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

Ridge Functions, Sigmoidal Functions and Neural Networks

This paper considers mainly approximation by ridge functions. Fix a point a 2 IR n and a function g : IR ! IR. Then the function f : IR n ! IR deened by f (x) = g(ax), x 2 IR n , is a ridge or plane

Deep Neural Network Approximation Theory

Deep networks provide exponential approximation accuracy—i.e., the approximation error decays exponentially in the number of nonzero weights in the network— of the multiplication operation, polynomials, sinusoidal functions, and certain smooth functions.

Neural Network Learning - Theoretical Foundations

The authors explain the role of scale-sensitive versions of the Vapnik Chervonenkis dimension in large margin classification, and in real prediction, and discuss the computational complexity of neural network learning.
...