Corpus ID: 6713421

Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels

@article{Zhong2017LearningNC,
  title={Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels},
  author={Kai Zhong and Zhao Song and I. Dhillon},
  journal={ArXiv},
  year={2017},
  volume={abs/1711.03440}
}
In this paper, we consider parameter recovery for non-overlapping convolutional neural networks (CNNs) with multiple kernels. We show that when the inputs follow Gaussian distribution and the sample size is sufficiently large, the squared loss of such CNNs is $\mathit{~locally~strongly~convex}$ in a basin of attraction near the global optima for most popular activation functions, like ReLU, Leaky ReLU, Squared ReLU, Sigmoid and Tanh. The required sample complexity is proportional to the… Expand
Guaranteed Recovery of One-Hidden-Layer Neural Networks via Cross Entropy
TLDR
It is proved that with Gaussian inputs, the empirical risk based on cross entropy exhibits strong convexity and smoothness uniformly in a local neighborhood of the ground truth, as soon as the sample complexity is sufficiently large. Expand
On the Learning Dynamics of Two-layer Nonlinear Convolutional Neural Networks
TLDR
This work theoretically and empirically shows that some convolutional filters will learn the key patterns in data and the norm of these filters will dominate during the training process with stochastic gradient descent, and with any high probability, the CNN model could obtain 100% accuracy over the considered data distributions. Expand
Improved Linear Convergence of Training CNNs With Generalizability Guarantees: A One-Hidden-Layer Case
TLDR
This work is the first work to show that accelerated GD algorithms can find the global optimizer of the nonconvex learning problem of neural networks and characterizes the sample complexity of gradient-based methods in learning convolutional neural networks with the nonsmooth ReLU activation function. Expand
Learning One Convolutional Layer with Overlapping Patches
TLDR
Convotron is given the first provably efficient algorithm for learning a one hidden layer convolutional network with respect to a general class of (potentially overlapping) patches and it is proved that the framework captures commonly used schemes from computer vision, including one-dimensional and two-dimensional "patch and stride" convolutions. Expand
Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks
TLDR
A novel algorithm called approximate gradient descent for training CNNs is proposed, and it is shown that, with high probability, the proposed algorithm with random initialization grants a linear convergence to the ground-truth parameters up to statistical precision. Expand
Local Geometry of One-Hidden-Layer Neural Networks for Logistic Regression
TLDR
This work proves that under Gaussian input, the empirical risk function employing quadratic loss exhibits strong convexity and smoothness uniformly in a local neighborhood of the ground truth, for a class of smooth activation functions satisfying certain properties, including sigmoid and tanh, as soon as the sample complexity is sufficiently large. Expand
Guaranteed Convergence of Training Convolutional Neural Networks via Accelerated Gradient Descent
TLDR
It is proved that if the inputs belong to Gaussian distribution, then the optimization problem can be solved by accelerated gradient descent (AGD) algorithm with a well-designed initial point and enough samples, and the iterates via AGD algorithm converge linearly to the ground-truth weights. Expand
On the Convergence , Generalization and Recovery Guarantees of Deep Neural Networks
Deep neural networks learn hierarchical representations of data using multiple layers of linear transformations and non-linear activation functions. Convolutional networks incorporate learnableExpand
A Convergence Theory for Deep Learning via Over-Parameterization
TLDR
This work proves why stochastic gradient descent can find global minima on the training objective of DNNs in $\textit{polynomial time}$ and implies an equivalence between over-parameterized neural networks and neural tangent kernel (NTK) in the finite (and polynomial) width setting. Expand
End-to-end Learning of a Convolutional Neural Network via Deep Tensor Decomposition
TLDR
This paper develops an algorithm for simultaneously learning all the kernels from the training data based on a rank-1 tensor decomposition, and shows that DeepTD is data-efficient and provably works as soon as the sample size exceeds the total number of convolutional weights in the network. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 37 REFERENCES
Recovery Guarantees for One-hidden-layer Neural Networks
TLDR
This work distill some properties of activation functions that lead to local strong convexity in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective, and provides recovery guarantees for 1NNs with both sample complexity and computational complexity $\mathit{linear}$ in the input dimension and $\math it{logarithmic}$in the precision. Expand
When is a Convolutional Filter Easy To Learn?
TLDR
It is shown that (stochastic) gradient descent with random initialization can learn the convolutional filter in polynomial time and the convergence rate depends on the smoothness of the input distribution and the closeness of patches. Expand
Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods
TLDR
This work proposes a computationally efficient method with guaranteed risk bounds for training neural networks with one hidden layer based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. Expand
Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs
TLDR
This work provides the first global optimality guarantee of gradient descent on a convolutional neural network with ReLU activations, and shows that learning is NP-complete in the general case, but that when the input distribution is Gaussian, gradient descent converges to the global optimum in polynomial time. Expand
Convexified Convolutional Neural Networks
TLDR
For learning two-layer convolutional neural networks, it is proved that the generalization error obtained by a convexified CNN converges to that of the best possible CNN. Expand
Convolutional Rectifier Networks as Generalized Tensor Decompositions
TLDR
Developing effective methods for training convolutional arithmetic circuits may give rise to a deep learning architecture that is provably superior to Convolutional rectifier networks, which has so far been overlooked by practitioners. Expand
On the Quality of the Initial Basin in Overspecified Neural Networks
TLDR
This work studies thegeometric structure of the associated non-convex objective function, in the context of ReLU networks and starting from a random initialization of the network parameters, and identifies some conditions under which it becomes more favorable to optimization. Expand
Diversity Leads to Generalization in Neural Networks
TLDR
It is shown that despite the non-convexity, neural networks with diverse units can learn the true function of the loss function, and a novel regularization function is suggested to promote unit diversity for potentially better generalization ability. Expand
Global Optimality in Tensor Factorization, Deep Learning, and Beyond
TLDR
This framework derives sufficient conditions to guarantee that a local minimum of the non-convex optimization problem is a global minimum and shows that if the size of the factorized variables is large enough then from any initialization it is possible to find a global minimizer using a purely local descent algorithm. Expand
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective. Expand
...
1
2
3
4
...