Identification of Shallow Neural Networks by Fewest Samples

@article{Fornasier2018IdentificationOS,
  title={Identification of Shallow Neural Networks by Fewest Samples},
  author={Massimo Fornasier and Jan Vyb{\'i}ral and Ingrid Daubechies},
  journal={ArXiv},
  year={2018},
  volume={abs/1804.01592}
}
We address the uniform approximation of sums of ridge functions $\sum_{i=1}^m g_i(a_i\cdot x)$ on ${\mathbb R}^d$, representing the shallowest form of feed-forward neural network, from a small number of query samples, under mild smoothness assumptions on the functions $g_i$'s and near-orthogonality of the ridge directions $a_i$'s. The sample points are randomly generated and are universal, in the sense that the sampled queries on those points will allow the proposed recovery algorithms to… 
Robust and Resource Efficient Identification of Two Hidden Layer Neural Networks
TLDR
This work addresses the structure identification and the uniform approximation of two fully nonlinear layer neural networks of the type $f(x)=1^T h(B^T g(A^T x), and provides guarantees of stable recovery under a posteriori verifiable conditions.
Affine Symmetries and Neural Network Identifiability
TLDR
In an effort to answer the identifiability question in greater generality, arbitrary nonlinearities with potentially complicated affine symmetries are considered, and it is shown that the asymmetries can be used to find a rich set of networks giving rise to the same function $f$.
Stable Recovery of Entangled Weights: Towards Robust Identification of Deep Neural Networks from Minimal Samples
TLDR
It is proved that entangled weights are completely and stably approximated by an efficient and robust algorithm as soon as O(D ×m) nonadaptive input-output samples of the network are collected, where D is the input dimension and m is the number of neurons of thenetwork.
Landscape analysis of an improved power method for tensor decomposition
TLDR
This work derives quantitative bounds such that any second-order critical point with SPM objective value exceeding the bound must equal a tensor component in the noiseless case, and must approximate a Tensor componentIn the noisy case, implying that SPM with suitable initialization is a provable, efficient, robust algorithm for low-rank symmetric tensor decomposition.
Affine symmetries and neural network identifiability
TLDR
This work exhibits a class of “tanh-type” nonlinearities (including the tanh function itself) for which such a network A does not exist, thereby solving the identifiability question for these non linearities in full generality.
Going Beyond Linear RL: Sample Efficient Neural Function Approximation
TLDR
The focus of this work is function approximation with two-layer neural networks (considering both ReLU and polynomial activation functions), where the results significantly improve upon what can be attained with linear (or eluder dimension) methods.
Estimating multi-index models with response-conditional least squares
The multi-index model is a simple yet powerful high-dimensional regression model which circumvents the curse of dimensionality assuming $ \mathbb{E} [ Y | X ] = g(A^\top X) $ for some unknown index
Subspace power method for symmetric tensor decomposition and generalized PCA
TLDR
Numerical simulations indicate that SPM significantly outperforms state-of-the-art algorithms in terms of speed, while performing robustly for low-rank tensors subjected to additive noise.

References

SHOWING 1-10 OF 81 REFERENCES
Robust and Resource Efficient Identification of Two Hidden Layer Neural Networks
TLDR
This work addresses the structure identification and the uniform approximation of two fully nonlinear layer neural networks of the type $f(x)=1^T h(B^T g(A^T x), and provides guarantees of stable recovery under a posteriori verifiable conditions.
Approximation by ridge functions and neural networks
We investigate the efficiency of approximation by linear combinations of ridge functions in the metric of L2 (Bd ) with Bd the unit ball in Rd . If Xn is an n-dimensional linear space of univariate
Entropy and Sampling Numbers of Classes of Ridge Functions
We study the properties of ridge functions $$f(x)=g(a\cdot x)$$f(x)=g(a·x) in high dimensions $$d$$d from the viewpoint of approximation theory. The function classes considered consist of ridge
Breaking the Curse of Dimensionality with Convex Neural Networks
  • F. Bach
  • Computer Science, Mathematics
    J. Mach. Learn. Res.
  • 2017
TLDR
This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.
Capturing Ridge Functions in High Dimensions from Point Queries
Constructing a good approximation to a function of many variables suffers from the “curse of dimensionality”. Namely, functions on ℝN with smoothness of order s can in general be captured with
Learning Functions of Few Arbitrary Linear Parameters in High Dimensions
TLDR
The approach uses tools taken from the compressed sensing framework, recent Chernoff bounds for sums of positive semidefinite matrices, and classical stability bounds for invariant subspaces of singular value decompositions for computing the approximating function, whose complexity is at most polynomial in the dimension d and in the number m of points.
Greedy Layer-Wise Training of Deep Networks
TLDR
These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.
Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit
TLDR
This paper shows that the number of hidden units only needs to be larger than a quantity dependent on the regularity properties of the data, and independent of the dimensions, and generalizes this analysis to the case of unbounded activation functions.
Ridgelets: estimating with ridge functions
Feedforward neural networks, projection pursuit regression, and more generally, estimation via ridge functions have been proposed as an approach to bypass the curse of dimensionality and are now
On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition
TLDR
It is proved that learning a two-layers neural network that generalizes well is at least as hard as tensor decomposition.
...
1
2
3
4
5
...