# Generalization Bounds for Neural Networks through Tensor Factorization

@article{Janzamin2015GeneralizationBF, title={Generalization Bounds for Neural Networks through Tensor Factorization}, author={Majid Janzamin and Hanie Sedghi and Anima Anandkumar}, journal={ArXiv}, year={2015}, volume={abs/1506.08473} }

Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for training a two-layer neural network. We prove efficient generalization bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NP-hard, we provide… Expand

#### Paper Mentions

#### 20 Citations

Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial Convergence and SQ Lower Bounds

- Computer Science, Mathematics
- COLT
- 2019

An agnostic learning guarantee is given for GD: starting from a randomly initialized network, it converges in mean squared loss to the minimum error of the best approximation of the target function using a polynomial of degree at most $k$. Expand

On the Learnability of Fully-Connected Neural Networks

- Computer Science
- AISTATS
- 2017

This paper characterize the learnability of fullyconnected neural networks via both positive and negative results, and establishes a hardness result showing that the exponential dependence on 1/ is unavoidable unless RP = NP. Expand

Convexified Convolutional Neural Networks

- Computer Science, Mathematics
- ICML
- 2017

For learning two-layer convolutional neural networks, it is proved that the generalization error obtained by a convexified CNN converges to that of the best possible CNN. Expand

On the Convergence , Generalization and Recovery Guarantees of Deep Neural Networks

- 2018

Deep neural networks learn hierarchical representations of data using multiple layers of linear transformations and non-linear activation functions. Convolutional networks incorporate learnable… Expand

L1-regularized Neural Networks are Improperly Learnable in Polynomial Time

- Mathematics, Computer Science
- ICML
- 2016

A kernel-based method, such that with probability at least 1 - δ, it learns a predictor whose generalization error is at most e worse than that of the neural network, implies that any sufficiently sparse neural network is learnable in polynomial time. Expand

Train faster, generalize better: Stability of stochastic gradient descent

- Computer Science, Mathematics
- ICML
- 2016

We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmically… Expand

Polynomial Convergence of Gradient Descent for Training One-Hidden-Layer Neural Networks

- Mathematics, Computer Science
- ArXiv
- 2018

We analyze Gradient Descent applied to learning a bounded target function on $n$ real-valued inputs by training a neural network with a single hidden layer of nonlinear gates. Our main finding is… Expand

On the Complexity of Learning Neural Networks

- Computer Science, Mathematics
- NIPS
- 2017

A comprehensive lower bound is demonstrated ruling out the possibility that data generated by neural networks with a single hidden layer, smooth activation functions and benign input distributions can be learned efficiently, and is robust to small perturbations of the true weights. Expand

Tensor Contraction Layers for Parsimonious Deep Nets

- Computer Science
- 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
- 2017

This paper proposes the Tensor Contraction Layer (TCL), the first attempt to incorporate tensor contractions as end-to-end trainable neural network layers and investigates several ways to apply them to activation tensors. Expand

Tensor Contraction & Regression Networks

- Computer Science
- 2018

Tensor contraction layers are introduced which can replace the ordinary fully-connected layers in a neural network and tensor regression layers, which express the output of a neuralnetwork as a low-rank multi-linear mapping from a high-order activation tensor to the softmax layer. Expand

#### References

SHOWING 1-10 OF 30 REFERENCES

Learning Polynomials with Neural Networks

- Mathematics, Computer Science
- ICML
- 2014

This paper shows that for a randomly initialized neural network with sufficiently many hidden units, the generic gradient descent algorithm learns any low degree polynomial, assuming the authors initialize the weights randomly, and shows that if they use complex-valued weights, there are no "robust local minima". Expand

Hardness results for neural network approximation problems

- Mathematics, Computer Science
- Theor. Comput. Sci.
- 2002

It is NP-hard to find a linear threshold network of a fixed size that approximately minimizes the proportion of misclassified examples in a training set, even if there is a network that correctly classifies all of the training examples. Expand

Neural networks and principal component analysis: Learning from examples without local minima

- Mathematics, Computer Science
- Neural Networks
- 1989

The main result is a complete description of the landscape attached to E in terms of principal component analysis, showing that E has a unique minimum corresponding to the projection onto the subspace generated by the first principal vectors of a covariance matrix associated with the training patterns. Expand

Provable Methods for Training Neural Networks with Sparse Connectivity

- Computer Science, Mathematics
- ICLR
- 2015

Novel guaranteed approaches for training feedforward neural networks with sparse connectivity and their factorization provably yields the weight matrix of the first layer of a deep network under mild conditions are provided. Expand

Learning Overcomplete Latent Variable Models through Tensor Methods

- Computer Science, Mathematics
- COLT
- 2015

The main tool is a new algorithm for tensor decomposition that works in the overcomplete regime, and a simple initialization algorithm based on SVD of the tensor slices is proposed, and guarantees are provided under the stricter condition that k ≤ βd. Expand

Smoothed analysis of tensor decompositions

- Mathematics, Computer Science
- STOC
- 2014

This work introduces a smoothed analysis model for studying generative models and develops an efficient algorithm for tensor decomposition in the highly overcomplete case (rank polynomial in the dimension) and shows that tensor products of perturbed vectors are linearly independent in a robust sense. Expand

Tensor decompositions for learning latent variable models

- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2014

A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices, and implies a robust and computationally tractable estimation approach for several popular latent variable models. Expand

On the Computational Efficiency of Training Neural Networks

- Computer Science, Mathematics
- NIPS
- 2014

This paper revisits the computational complexity of training neural networks from a modern perspective and provides both positive and negative results, some of them yield new provably efficient and practical algorithms for training certain types of neural networks. Expand

Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates

- Mathematics, Computer Science
- ArXiv
- 2014

In this paper, local and global convergence guarantees for recovering CP (Candecomp/Parafac) tensor decomposition are provided and tight perturbation analysis given noisy tensor is included. Expand

Training a Single Sigmoidal Neuron Is Hard

- Mathematics, Computer Science
- Neural Comput.
- 2002

It is proved that the simplest architecture containing only a single neuron that applies a sigmoidal activation function sigma, satisfying certain natural axioms, to the weighted sum of n inputs is hard to train. Expand