# L1-regularized Neural Networks are Improperly Learnable in Polynomial Time

@inproceedings{Zhang2016L1regularizedNN, title={L1-regularized Neural Networks are Improperly Learnable in Polynomial Time}, author={Yuchen Zhang and J. Lee and Michael I. Jordan}, booktitle={ICML}, year={2016} }

We study the improper learning of multi-layer neural networks. Suppose that the neural network to be learned has k hidden layers and that the l1-norm of the incoming weights of any neuron is bounded by L. We present a kernel-based method, such that with probability at least 1 - δ, it learns a predictor whose generalization error is at most e worse than that of the neural network. The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in… Expand

#### 82 Citations

Learning Deep ReLU Networks Is Fixed-Parameter Tractable

- Computer Science, Mathematics
- ArXiv
- 2020

An algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters is given, whose bounds depend on the number of hidden units, depth, spectral norm of the weight matrices, and Lipschitz constant of the overall network. Expand

Learning Neural Networks with Two Nonlinear Layers in Polynomial Time

- Computer Science
- COLT
- 2019

This work gives a polynomial-time algorithm for learning neural networks with one layer of sigmoids feeding into any Lipschitz, monotone activation function (e.g., sigmoid or ReLU), and suggests a new approach to Boolean learning problems via real-valued conditional-mean functions, sidestepping traditional hardness results from computational learning theory. Expand

On the Learnability of Fully-Connected Neural Networks

- Computer Science
- AISTATS
- 2017

This paper characterize the learnability of fullyconnected neural networks via both positive and negative results, and establishes a hardness result showing that the exponential dependence on 1/ is unavoidable unless RP = NP. Expand

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

- Computer Science, Mathematics
- NeurIPS
- 2019

It is proved that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations, and SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. Expand

SGD Learns the Conjugate Kernel Class of the Network

- Computer Science, Mathematics
- NIPS
- 2017

We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to learn, in polynomial time, a function that is competitive with the best function in the conjugate kernel space of… Expand

Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks

- Computer Science, Mathematics
- NIPS
- 2017

This work shows that a natural distributional assumption corresponding to {\em eigenvalue decay} of the Gram matrix yields polynomial-time algorithms in the non-realizable setting for expressive classes of networks (e.g. feed-forward networks of ReLUs). Expand

On the Convergence Rate of Training Recurrent Neural Networks

- Computer Science, Mathematics
- NeurIPS
- 2019

It is shown when the number of neurons is sufficiently large, meaning polynomial in the training data size and in thelinear convergence rate, then SGD is capable of minimizing the regression loss in the linear convergence rate and gives theoretical evidence of how RNNs can memorize data. Expand

Principled Deep Neural Network Training through Linear Programming

- Computer Science, Mathematics
- ArXiv
- 2018

This work shows that large classes of deep neural networks with various architectures, activation functions, and loss functions can be trained to near optimality with desired target accuracy using linear programming in time that is exponential in the input data and parameter space dimension and polynomial in the size of the data set. Expand

Recovery Guarantees for One-hidden-layer Neural Networks

- Mathematics, Computer Science
- ICML
- 2017

This work distill some properties of activation functions that lead to local strong convexity in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective, and provides recovery guarantees for 1NNs with both sample complexity and computational complexity $\mathit{linear}$ in the input dimension and $\math it{logarithmic}$in the precision. Expand

Learning Two-layer Neural Networks with Symmetric Inputs

- Computer Science, Mathematics
- ICLR
- 2019

A new algorithm for learning a two-layer neural network under a general class of input distributions based on the method-of-moments framework and extends several results in tensor decompositions to avoid the complicated non-convex optimization in learning neural networks. Expand

#### References

SHOWING 1-10 OF 32 REFERENCES

Learning Polynomials with Neural Networks

- Mathematics, Computer Science
- ICML
- 2014

This paper shows that for a randomly initialized neural network with sufficiently many hidden units, the generic gradient descent algorithm learns any low degree polynomial, assuming the authors initialize the weights randomly, and shows that if they use complex-valued weights, there are no "robust local minima". Expand

Generalization Bounds for Neural Networks through Tensor Factorization

- Computer Science, Mathematics
- ArXiv
- 2015

This work proposes a novel algorithm based on tensor decomposition for training a two-layer neural network, and proves efficient generalization bounds for this method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. Expand

Training a 3-node neural network is NP-complete

- Computer Science, Mathematics
- Neural Networks
- 1992

It is NP-complete to decide whether there exist weights and thresholds for the three nodes of this network so that it will produce output consistent with a given set of training examples, suggesting that those looking for perfect training algorithms cannot escape inherent computational difficulties just by considering only simple or very regular networks. Expand

On the Computational Efficiency of Training Neural Networks

- Computer Science, Mathematics
- NIPS
- 2014

This paper revisits the computational complexity of training neural networks from a modern perspective and provides both positive and negative results, some of them yield new provably efficient and practical algorithms for training certain types of neural networks. Expand

Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

- Computer Science
- 2017

This work proposes a computationally efficient method with guaranteed risk bounds for training neural networks with one hidden layer based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. Expand

Learning Kernel-Based Halfspaces with the 0-1 Loss

- Mathematics, Computer Science
- SIAM J. Comput.
- 2011

A new algorithm for agnostically learning kernel-based halfspaces with respect to the 0-1 loss function is described and analyzed and proves a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn kernel- based halfspace in time polynomial in $L$. Expand

Convex Neural Networks

- Computer Science, Mathematics
- NIPS
- 2005

Training multi-layer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem, which involves an infinite number of variables but can be solved by incrementally inserting a hidden unit at a time. Expand

Universal approximation bounds for superpositions of a sigmoidal function

- Mathematics, Computer Science
- IEEE Trans. Inf. Theory
- 1993

The approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings and the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption. Expand

ImageNet classification with deep convolutional neural networks

- Computer Science
- Commun. ACM
- 2012

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective. Expand

Provable Bounds for Learning Some Deep Representations

- Computer Science, Mathematics
- ICML
- 2014

This work gives algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others, based upon a novel idea of observing correlations among features and using these to infer the underlying edge structure via a global graph recovery procedure. Expand