Corpus ID: 15688894

L1-regularized Neural Networks are Improperly Learnable in Polynomial Time

@inproceedings{Zhang2016L1regularizedNN,
  title={L1-regularized Neural Networks are Improperly Learnable in Polynomial Time},
  author={Yuchen Zhang and J. Lee and Michael I. Jordan},
  booktitle={ICML},
  year={2016}
}
We study the improper learning of multi-layer neural networks. Suppose that the neural network to be learned has k hidden layers and that the l1-norm of the incoming weights of any neuron is bounded by L. We present a kernel-based method, such that with probability at least 1 - δ, it learns a predictor whose generalization error is at most e worse than that of the neural network. The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in… Expand
Learning Deep ReLU Networks Is Fixed-Parameter Tractable
TLDR
An algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters is given, whose bounds depend on the number of hidden units, depth, spectral norm of the weight matrices, and Lipschitz constant of the overall network. Expand
Learning Neural Networks with Two Nonlinear Layers in Polynomial Time
TLDR
This work gives a polynomial-time algorithm for learning neural networks with one layer of sigmoids feeding into any Lipschitz, monotone activation function (e.g., sigmoid or ReLU), and suggests a new approach to Boolean learning problems via real-valued conditional-mean functions, sidestepping traditional hardness results from computational learning theory. Expand
On the Learnability of Fully-Connected Neural Networks
TLDR
This paper characterize the learnability of fullyconnected neural networks via both positive and negative results, and establishes a hardness result showing that the exponential dependence on 1/ is unavoidable unless RP = NP. Expand
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
TLDR
It is proved that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations, and SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. Expand
SGD Learns the Conjugate Kernel Class of the Network
We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to learn, in polynomial time, a function that is competitive with the best function in the conjugate kernel space ofExpand
Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks
TLDR
This work shows that a natural distributional assumption corresponding to {\em eigenvalue decay} of the Gram matrix yields polynomial-time algorithms in the non-realizable setting for expressive classes of networks (e.g. feed-forward networks of ReLUs). Expand
On the Convergence Rate of Training Recurrent Neural Networks
TLDR
It is shown when the number of neurons is sufficiently large, meaning polynomial in the training data size and in thelinear convergence rate, then SGD is capable of minimizing the regression loss in the linear convergence rate and gives theoretical evidence of how RNNs can memorize data. Expand
Principled Deep Neural Network Training through Linear Programming
TLDR
This work shows that large classes of deep neural networks with various architectures, activation functions, and loss functions can be trained to near optimality with desired target accuracy using linear programming in time that is exponential in the input data and parameter space dimension and polynomial in the size of the data set. Expand
Recovery Guarantees for One-hidden-layer Neural Networks
TLDR
This work distill some properties of activation functions that lead to local strong convexity in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective, and provides recovery guarantees for 1NNs with both sample complexity and computational complexity $\mathit{linear}$ in the input dimension and $\math it{logarithmic}$in the precision. Expand
Learning Two-layer Neural Networks with Symmetric Inputs
TLDR
A new algorithm for learning a two-layer neural network under a general class of input distributions based on the method-of-moments framework and extends several results in tensor decompositions to avoid the complicated non-convex optimization in learning neural networks. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 32 REFERENCES
Learning Polynomials with Neural Networks
TLDR
This paper shows that for a randomly initialized neural network with sufficiently many hidden units, the generic gradient descent algorithm learns any low degree polynomial, assuming the authors initialize the weights randomly, and shows that if they use complex-valued weights, there are no "robust local minima". Expand
Generalization Bounds for Neural Networks through Tensor Factorization
TLDR
This work proposes a novel algorithm based on tensor decomposition for training a two-layer neural network, and proves efficient generalization bounds for this method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. Expand
Training a 3-node neural network is NP-complete
TLDR
It is NP-complete to decide whether there exist weights and thresholds for the three nodes of this network so that it will produce output consistent with a given set of training examples, suggesting that those looking for perfect training algorithms cannot escape inherent computational difficulties just by considering only simple or very regular networks. Expand
On the Computational Efficiency of Training Neural Networks
TLDR
This paper revisits the computational complexity of training neural networks from a modern perspective and provides both positive and negative results, some of them yield new provably efficient and practical algorithms for training certain types of neural networks. Expand
Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods
TLDR
This work proposes a computationally efficient method with guaranteed risk bounds for training neural networks with one hidden layer based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. Expand
Learning Kernel-Based Halfspaces with the 0-1 Loss
TLDR
A new algorithm for agnostically learning kernel-based halfspaces with respect to the 0-1 loss function is described and analyzed and proves a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn kernel- based halfspace in time polynomial in $L$. Expand
Convex Neural Networks
TLDR
Training multi-layer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem, which involves an infinite number of variables but can be solved by incrementally inserting a hidden unit at a time. Expand
Universal approximation bounds for superpositions of a sigmoidal function
  • A. Barron
  • Mathematics, Computer Science
  • IEEE Trans. Inf. Theory
  • 1993
TLDR
The approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings and the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption. Expand
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective. Expand
Provable Bounds for Learning Some Deep Representations
TLDR
This work gives algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others, based upon a novel idea of observing correlations among features and using these to infer the underlying edge structure via a global graph recovery procedure. Expand
...
1
2
3
4
...