Corpus ID: 15105362

Learning Halfspaces and Neural Networks with Random Initialization

@article{Zhang2015LearningHA,
  title={Learning Halfspaces and Neural Networks with Random Initialization},
  author={Yuchen Zhang and J. Lee and M. Wainwright and Michael I. Jordan},
  journal={ArXiv},
  year={2015},
  volume={abs/1511.07948}
}
We study non-convex empirical risk minimization for learning halfspaces and neural networks. For loss functions that are $L$-Lipschitz continuous, we present algorithms to learn halfspaces and multi-layer neural networks that achieve arbitrarily small excess risk $\epsilon>0$. The time complexity is polynomial in the input dimension $d$ and the sample size $n$, but exponential in the quantity $(L/\epsilon^2)\log(L/\epsilon)$. These algorithms run multiple rounds of random initialization… Expand
Reliably Learning the ReLU in Polynomial Time
TLDR
A hypothesis is constructed that simultaneously minimizes the false-positive rate and the loss on inputs given positive labels by $\cal{D}$, for any convex, bounded, and Lipschitz loss function. Expand
Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks
TLDR
This work shows that a natural distributional assumption corresponding to {\em eigenvalue decay} of the Gram matrix yields polynomial-time algorithms in the non-realizable setting for expressive classes of networks (e.g. feed-forward networks of ReLUs). Expand
Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks
Let $ f^{\star} $ be a function on $ \mathbb{R}^d $ with an assumption of a spectral norm $ v_{f^{\star}} $. For various noise settings, we show that $ \mathbb{E}\|\hat{f} - f^{\star} \|^2 \leqExpand
On the Quality of the Initial Basin in Overspecified Neural Networks
TLDR
This work studies thegeometric structure of the associated non-convex objective function, in the context of ReLU networks and starting from a random initialization of the network parameters, and identifies some conditions under which it becomes more favorable to optimization. Expand
How Many Samples are Needed to Learn a Convolutional Neural Network?
TLDR
It is shown that for learning an $m-dimensional convolutional filter with linear activation acting on a $d$-dimensional input, the sample complexity of achieving population prediction error of $\epsilon$ is $\widetilde{O} (m/\Epsilon^2)$, whereas its FNN counterpart needs at least $\Omega(d/\epsil on)$ samples. Expand
Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima
We consider the problem of learning a one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation, i.e., $f(\mathbf{Z}, \mathbf{w}, \mathbf{a}) = \sum_jExpand
Distribution-Specific Hardness of Learning Neural Networks
  • O. Shamir
  • Computer Science, Mathematics
  • J. Mach. Learn. Res.
  • 2018
TLDR
This paper identifies a family of simple target functions, which are difficult to learn even if the input distribution is "nice", and provides evidence that neither class of assumptions alone is sufficient. Expand
Convergence Analysis of Two-layer Neural Networks with ReLU Activation
TLDR
A convergence analysis for SGD is provided on a rich subset of two-layer feedforward networks with ReLU activations characterized by a special structure called "identity mapping" that proves that, if input follows from Gaussian distribution, with standard $O(1/\sqrt{d})$ initialization of the weights, SGD converges to the global minimum in polynomial number of steps. Expand
SGD Learns the Conjugate Kernel Class of the Network
We show that the standard stochastic gradient decent (SGD) algorithm is guaranteed to learn, in polynomial time, a function that is competitive with the best function in the conjugate kernel space ofExpand
How Many Samples are Needed to Estimate a Convolutional Neural Network?
A widespread folklore for explaining the success of Convolutional Neural Networks (CNNs) is that CNNs use a more compact representation than the Fully-connected Neural Network (FNN) and thus requireExpand
...
1
2
3
4
...

References

SHOWING 1-10 OF 40 REFERENCES
Learning Kernel-Based Halfspaces with the 0-1 Loss
TLDR
A new algorithm for agnostically learning kernel-based halfspaces with respect to the 0-1 loss function is described and analyzed and proves a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn kernel- based halfspace in time polynomial in $L$. Expand
L1-regularized Neural Networks are Improperly Learnable in Polynomial Time
TLDR
A kernel-based method, such that with probability at least 1 - δ, it learns a predictor whose generalization error is at most e worse than that of the neural network, implies that any sufficiently sparse neural network is learnable in polynomial time. Expand
Efficient Learning of Linear Separators under Bounded Noise
TLDR
This work provides the first evidence that one can indeed design algorithms achieving arbitrarily small excess error in polynomial time under this realistic noise model and thus opens up a new and exciting line of research. Expand
Agnostically learning halfspaces
We give the first algorithm that (under distributional assumptions) efficiently learns halfspaces in the notoriously difficult agnostic framework of Kearns, Schapire, & Sellie, where a learner isExpand
Efficient Learning of Linear Perceptrons
TLDR
It is proved that unless P=NP, there is no algorithm that runs in time polynomial in the sample size and in 1/µ that is µ-margin successful for all µ > 0. Expand
Generalization Bounds for Neural Networks through Tensor Factorization
TLDR
This work proposes a novel algorithm based on tensor decomposition for training a two-layer neural network, and proves efficient generalization bounds for this method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. Expand
Hardness of Learning Halfspaces with Noise
  • V. Guruswami, P. Raghavendra
  • Computer Science, Mathematics
  • 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
  • 2006
TLDR
It is proved that even a tiny amount of worst-case noise makes the problem of learning halfspaces intractable in a strong sense, and a strong hardness is obtained for another basic computational problem: solving a linear system over the rationals. Expand
Learning Halfspaces with Malicious Noise
TLDR
New algorithms for learning halfspaces in the challenging malicious noise model can tolerate malicious noise rates exponentially larger than previous work in terms of the dependence on the dimension n, and succeed for the fairly broad class of all isotropic log-concave distributions. Expand
Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods
TLDR
This work proposes a computationally efficient method with guaranteed risk bounds for training neural networks with one hidden layer based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. Expand
Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs
TLDR
It is shown that there are cases in which α = o(1/γ) but the problem is still solvable in polynomial time, and that this results naturally extend to the adversarial online learning model and to the PAC learning with malicious noise model. Expand
...
1
2
3
4
...