Corpus ID: 18373862

Generalization Bounds for Neural Networks through Tensor Factorization

@article{Janzamin2015GeneralizationBF,
  title={Generalization Bounds for Neural Networks through Tensor Factorization},
  author={Majid Janzamin and Hanie Sedghi and Anima Anandkumar},
  journal={ArXiv},
  year={2015},
  volume={abs/1506.08473}
}
Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for training a two-layer neural network. We prove efficient generalization bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NP-hard, we provide… Expand
Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial Convergence and SQ Lower Bounds
TLDR
An agnostic learning guarantee is given for GD: starting from a randomly initialized network, it converges in mean squared loss to the minimum error of the best approximation of the target function using a polynomial of degree at most $k$. Expand
On the Learnability of Fully-Connected Neural Networks
TLDR
This paper characterize the learnability of fullyconnected neural networks via both positive and negative results, and establishes a hardness result showing that the exponential dependence on 1/ is unavoidable unless RP = NP. Expand
Convexified Convolutional Neural Networks
TLDR
For learning two-layer convolutional neural networks, it is proved that the generalization error obtained by a convexified CNN converges to that of the best possible CNN. Expand
On the Convergence , Generalization and Recovery Guarantees of Deep Neural Networks
Deep neural networks learn hierarchical representations of data using multiple layers of linear transformations and non-linear activation functions. Convolutional networks incorporate learnableExpand
L1-regularized Neural Networks are Improperly Learnable in Polynomial Time
TLDR
A kernel-based method, such that with probability at least 1 - δ, it learns a predictor whose generalization error is at most e worse than that of the neural network, implies that any sufficiently sparse neural network is learnable in polynomial time. Expand
Train faster, generalize better: Stability of stochastic gradient descent
We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmicallyExpand
Polynomial Convergence of Gradient Descent for Training One-Hidden-Layer Neural Networks
We analyze Gradient Descent applied to learning a bounded target function on $n$ real-valued inputs by training a neural network with a single hidden layer of nonlinear gates. Our main finding isExpand
On the Complexity of Learning Neural Networks
TLDR
A comprehensive lower bound is demonstrated ruling out the possibility that data generated by neural networks with a single hidden layer, smooth activation functions and benign input distributions can be learned efficiently, and is robust to small perturbations of the true weights. Expand
Tensor Contraction Layers for Parsimonious Deep Nets
TLDR
This paper proposes the Tensor Contraction Layer (TCL), the first attempt to incorporate tensor contractions as end-to-end trainable neural network layers and investigates several ways to apply them to activation tensors. Expand
Tensor Contraction & Regression Networks
TLDR
Tensor contraction layers are introduced which can replace the ordinary fully-connected layers in a neural network and tensor regression layers, which express the output of a neuralnetwork as a low-rank multi-linear mapping from a high-order activation tensor to the softmax layer. Expand
...
1
2
...

References

SHOWING 1-10 OF 30 REFERENCES
Learning Polynomials with Neural Networks
TLDR
This paper shows that for a randomly initialized neural network with sufficiently many hidden units, the generic gradient descent algorithm learns any low degree polynomial, assuming the authors initialize the weights randomly, and shows that if they use complex-valued weights, there are no "robust local minima". Expand
Hardness results for neural network approximation problems
TLDR
It is NP-hard to find a linear threshold network of a fixed size that approximately minimizes the proportion of misclassified examples in a training set, even if there is a network that correctly classifies all of the training examples. Expand
Neural networks and principal component analysis: Learning from examples without local minima
TLDR
The main result is a complete description of the landscape attached to E in terms of principal component analysis, showing that E has a unique minimum corresponding to the projection onto the subspace generated by the first principal vectors of a covariance matrix associated with the training patterns. Expand
Provable Methods for Training Neural Networks with Sparse Connectivity
TLDR
Novel guaranteed approaches for training feedforward neural networks with sparse connectivity and their factorization provably yields the weight matrix of the first layer of a deep network under mild conditions are provided. Expand
Learning Overcomplete Latent Variable Models through Tensor Methods
TLDR
The main tool is a new algorithm for tensor decomposition that works in the overcomplete regime, and a simple initialization algorithm based on SVD of the tensor slices is proposed, and guarantees are provided under the stricter condition that k ≤ βd. Expand
Smoothed analysis of tensor decompositions
TLDR
This work introduces a smoothed analysis model for studying generative models and develops an efficient algorithm for tensor decomposition in the highly overcomplete case (rank polynomial in the dimension) and shows that tensor products of perturbed vectors are linearly independent in a robust sense. Expand
Tensor decompositions for learning latent variable models
TLDR
A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin's perturbation theorem for the singular vectors of matrices, and implies a robust and computationally tractable estimation approach for several popular latent variable models. Expand
On the Computational Efficiency of Training Neural Networks
TLDR
This paper revisits the computational complexity of training neural networks from a modern perspective and provides both positive and negative results, some of them yield new provably efficient and practical algorithms for training certain types of neural networks. Expand
Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates
TLDR
In this paper, local and global convergence guarantees for recovering CP (Candecomp/Parafac) tensor decomposition are provided and tight perturbation analysis given noisy tensor is included. Expand
Training a Single Sigmoidal Neuron Is Hard
  • J. Síma
  • Mathematics, Computer Science
  • Neural Comput.
  • 2002
TLDR
It is proved that the simplest architecture containing only a single neuron that applies a sigmoidal activation function sigma, satisfying certain natural axioms, to the weighted sum of n inputs is hard to train. Expand
...
1
2
3
...