# Identification of Shallow Neural Networks by Fewest Samples

@article{Fornasier2018IdentificationOS,
title={Identification of Shallow Neural Networks by Fewest Samples},
author={Massimo Fornasier and Jan Vyb{\'i}ral and Ingrid Daubechies},
journal={ArXiv},
year={2018},
volume={abs/1804.01592}
}
• Published 4 April 2018
• Computer Science, Mathematics
• ArXiv
We address the uniform approximation of sums of ridge functions $\sum_{i=1}^m g_i(a_i\cdot x)$ on ${\mathbb R}^d$, representing the shallowest form of feed-forward neural network, from a small number of query samples, under mild smoothness assumptions on the functions $g_i$'s and near-orthogonality of the ridge directions $a_i$'s. The sample points are randomly generated and are universal, in the sense that the sampled queries on those points will allow the proposed recovery algorithms to…
8 Citations
Robust and Resource Efficient Identification of Two Hidden Layer Neural Networks
• Computer Science, Mathematics
ArXiv
• 2019
This work addresses the structure identification and the uniform approximation of two fully nonlinear layer neural networks of the type $f(x)=1^T h(B^T g(A^T x), and provides guarantees of stable recovery under a posteriori verifiable conditions. Affine Symmetries and Neural Network Identifiability • Mathematics, Computer Science ArXiv • 2020 In an effort to answer the identifiability question in greater generality, arbitrary nonlinearities with potentially complicated affine symmetries are considered, and it is shown that the asymmetries can be used to find a rich set of networks giving rise to the same function$f$. Stable Recovery of Entangled Weights: Towards Robust Identification of Deep Neural Networks from Minimal Samples • Computer Science ArXiv • 2021 It is proved that entangled weights are completely and stably approximated by an efficient and robust algorithm as soon as O(D ×m) nonadaptive input-output samples of the network are collected, where D is the input dimension and m is the number of neurons of thenetwork. Landscape analysis of an improved power method for tensor decomposition • Computer Science, Mathematics ArXiv • 2021 This work derives quantitative bounds such that any second-order critical point with SPM objective value exceeding the bound must equal a tensor component in the noiseless case, and must approximate a Tensor componentIn the noisy case, implying that SPM with suitable initialization is a provable, efficient, robust algorithm for low-rank symmetric tensor decomposition. Affine symmetries and neural network identifiability • Computer Science, Mathematics • 2021 This work exhibits a class of “tanh-type” nonlinearities (including the tanh function itself) for which such a network A does not exist, thereby solving the identifiability question for these non linearities in full generality. Going Beyond Linear RL: Sample Efficient Neural Function Approximation The focus of this work is function approximation with two-layer neural networks (considering both ReLU and polynomial activation functions), where the results significantly improve upon what can be attained with linear (or eluder dimension) methods. Estimating multi-index models with response-conditional least squares • Mathematics • 2020 The multi-index model is a simple yet powerful high-dimensional regression model which circumvents the curse of dimensionality assuming$ \mathbb{E} [ Y | X ] = g(A^\top X) $for some unknown index Subspace power method for symmetric tensor decomposition and generalized PCA • Mathematics, Computer Science ArXiv • 2019 Numerical simulations indicate that SPM significantly outperforms state-of-the-art algorithms in terms of speed, while performing robustly for low-rank tensors subjected to additive noise. ## References SHOWING 1-10 OF 81 REFERENCES Robust and Resource Efficient Identification of Two Hidden Layer Neural Networks • Computer Science, Mathematics ArXiv • 2019 This work addresses the structure identification and the uniform approximation of two fully nonlinear layer neural networks of the type$f(x)=1^T h(B^T g(A^T x), and provides guarantees of stable recovery under a posteriori verifiable conditions.
Approximation by ridge functions and neural networks
We investigate the efficiency of approximation by linear combinations of ridge functions in the metric of L2 (Bd ) with Bd the unit ball in Rd . If Xn is an n-dimensional linear space of univariate
Entropy and Sampling Numbers of Classes of Ridge Functions
• Mathematics
• 2013
We study the properties of ridge functions $$f(x)=g(a\cdot x)$$f(x)=g(a·x) in high dimensions $$d$$d from the viewpoint of approximation theory. The function classes considered consist of ridge
Breaking the Curse of Dimensionality with Convex Neural Networks
• F. Bach
• Computer Science, Mathematics
J. Mach. Learn. Res.
• 2017
This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.
Capturing Ridge Functions in High Dimensions from Point Queries
• Mathematics
• 2012
Constructing a good approximation to a function of many variables suffers from the “curse of dimensionality”. Namely, functions on ℝN with smoothness of order s can in general be captured with
Learning Functions of Few Arbitrary Linear Parameters in High Dimensions
• Mathematics, Computer Science
Found. Comput. Math.
• 2012
The approach uses tools taken from the compressed sensing framework, recent Chernoff bounds for sums of positive semidefinite matrices, and classical stability bounds for invariant subspaces of singular value decompositions for computing the approximating function, whose complexity is at most polynomial in the dimension d and in the number m of points.
Greedy Layer-Wise Training of Deep Networks
• Computer Science
NIPS
• 2006
These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.
Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit
• Mathematics, Physics
COLT
• 2019
This paper shows that the number of hidden units only needs to be larger than a quantity dependent on the regularity properties of the data, and independent of the dimensions, and generalizes this analysis to the case of unbounded activation functions.
Ridgelets: estimating with ridge functions
Feedforward neural networks, projection pursuit regression, and more generally, estimation via ridge functions have been proposed as an approach to bypass the curse of dimensionality and are now
On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition
• Computer Science, Mathematics
AISTATS
• 2019
It is proved that learning a two-layers neural network that generalizes well is at least as hard as tensor decomposition.