Corpus ID: 13597072

Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

@article{Janzamin2017BeatingTP,
  title={Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods},
  author={Majid Janzamin and Hanie Sedghi and Anima Anandkumar},
  journal={arXiv: Learning},
  year={2017}
}
Author(s): Janzamin, Majid; Sedghi, Hanie; Anandkumar, Anima | Abstract: Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for guaranteed training of two-layer neural networks. We provide risk bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons… Expand
Towards Provable Learning of Polynomial Neural Networks Using Low-Rank Matrix Estimation
TLDR
This work proposes two novel, nonconvex training algorithms which do not need any extra tuning parameters other than the number of hidden neurons and uses a lifting trick to borrow algorithmic ideas from low-rank matrix estimation. Expand
Fast and Provable Algorithms for Learning Two-Layer Polynomial Neural Networks
TLDR
This paper considers two-layer networks with quadratic activations, and focuses on the under-parameterized regime where the number of neurons in the hidden layer is smaller than the dimension of the input. Expand
Globally Optimal Training of Generalized Polynomial Neural Networks with Nonlinear Spectral Methods
TLDR
This work shows under quite weak assumptions on the data that a particular class of feedforward neural networks can be trained globally optimal with a linear convergence rate with a nonlinear spectral method, the first practically feasible method which achieves such a guarantee. Expand
Recovery Guarantees for One-hidden-layer Neural Networks
TLDR
This work distill some properties of activation functions that lead to local strong convexity in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective, and provides recovery guarantees for 1NNs with both sample complexity and computational complexity $\mathit{linear}$ in the input dimension and $\math it{logarithmic}$in the precision. Expand
No Spurious Local Minima in Deep Quadratic Networks
TLDR
It is proved that deep overparameterized neural networks with quadratic activations benefit from similar nice landscape properties and convergence to a global minimum for these problems is empirically demonstrated. Expand
Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels
TLDR
This is the first work that provides recovery guarantees for CNNs with multiple kernels under polynomial sample and computational complexities and shows that tensor methods are able to initialize the parameters to the local strong convex region. Expand
Learning Deep ReLU Networks Is Fixed-Parameter Tractable
TLDR
An algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters is given, whose bounds depend on the number of hidden units, depth, spectral norm of the weight matrices, and Lipschitz constant of the overall network. Expand
Diversity Leads to Generalization in Neural Networks
TLDR
It is shown that despite the non-convexity, neural networks with diverse units can learn the true function of the loss function, and a novel regularization function is suggested to promote unit diversity for potentially better generalization ability. Expand
Train faster, generalize better: Stability of stochastic gradient descent
We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmicallyExpand
On the Quality of the Initial Basin in Overspecified Neural Networks
TLDR
This work studies thegeometric structure of the associated non-convex objective function, in the context of ReLU networks and starting from a random initialization of the network parameters, and identifies some conditions under which it becomes more favorable to optimization. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 54 REFERENCES
Train faster, generalize better: Stability of stochastic gradient descent
We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmicallyExpand
Global Optimality in Tensor Factorization, Deep Learning, and Beyond
TLDR
This framework derives sufficient conditions to guarantee that a local minimum of the non-convex optimization problem is a global minimum and shows that if the size of the factorized variables is large enough then from any initialization it is possible to find a global minimizer using a purely local descent algorithm. Expand
Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates
TLDR
In this paper, local and global convergence guarantees for recovering CP (Candecomp/Parafac) tensor decomposition are provided and tight perturbation analysis given noisy tensor is included. Expand
L1-regularized Neural Networks are Improperly Learnable in Polynomial Time
TLDR
A kernel-based method, such that with probability at least 1 - δ, it learns a predictor whose generalization error is at most e worse than that of the neural network, implies that any sufficiently sparse neural network is learnable in polynomial time. Expand
Learning Polynomials with Neural Networks
TLDR
This paper shows that for a randomly initialized neural network with sufficiently many hidden units, the generic gradient descent algorithm learns any low degree polynomial, assuming the authors initialize the weights randomly, and shows that if they use complex-valued weights, there are no "robust local minima". Expand
Learning Overcomplete Latent Variable Models through Tensor Methods
TLDR
The main tool is a new algorithm for tensor decomposition that works in the overcomplete regime, and a simple initialization algorithm based on SVD of the tensor slices is proposed, and guarantees are provided under the stricter condition that k ≤ βd. Expand
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
TLDR
This paper proposes a new approach to second-order optimization, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods, and applies this algorithm to deep or recurrent neural network training, and provides numerical evidence for its superior optimization performance. Expand
Smoothed analysis of tensor decompositions
TLDR
This work introduces a smoothed analysis model for studying generative models and develops an efficient algorithm for tensor decomposition in the highly overcomplete case (rank polynomial in the dimension) and shows that tensor products of perturbed vectors are linearly independent in a robust sense. Expand
Hardness results for neural network approximation problems
TLDR
It is NP-hard to find a linear threshold network of a fixed size that approximately minimizes the proportion of misclassified examples in a training set, even if there is a network that correctly classifies all of the training examples. Expand
Score Function Features for Discriminative Learning: Matrix and Tensor Framework
TLDR
This paper considers a novel class of matrix and tensor-valued features, which can be pre- trained using unlabeled samples, and presents efficient algorithms for extracting discriminative information, given these pre-trained features and labeled samples for any related task. Expand
...
1
2
3
4
5
...