# Learning Kernel-Based Halfspaces with the 0-1 Loss

@article{ShalevShwartz2011LearningKH,
title={Learning Kernel-Based Halfspaces with the 0-1 Loss},
author={S. Shalev-Shwartz and O. Shamir and Karthik Sridharan},
journal={SIAM J. Comput.},
year={2011},
volume={40},
pages={1623-1646}
}
• Published 2011
• Mathematics, Computer Science
• SIAM J. Comput.
We describe and analyze a new algorithm for agnostically learning kernel-based halfspaces with respect to the 0-1 loss function. Unlike most of the previous formulations, which rely on surrogate convex loss functions (e.g., hinge-loss in support vector machines (SVMs) and log-loss in logistic regression), we provide finite time/sample guarantees with respect to the more natural 0-1 loss function. The proposed algorithm can learn kernel-based halfspaces in worst-case time poly$(\exp(L\log(L… Expand #### Figures and Topics from this paper The Complexity of Learning Halfspaces using Generalized Linear Methods • Mathematics, Computer Science • COLT • 2014 The main result shows that the approximation ratio of every efficient algorithm from this family must be$\ge \Omega\left(\frac{1/\gamma}{\mathrm{poly}\left(\log\left(1/ \gamma\right)\right)}\right)$, essentially matching the best known upper bound. Expand Learning Halfspaces and Neural Networks with Random Initialization • Mathematics, Computer Science • ArXiv • 2015 It is shown that if the data is separable by some neural network with constant margin$\gamma>0$, then there is a polynomial-time algorithm for learning a neural network that separates the training data with margin$\Omega(\gamma)$. Expand Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs • Computer Science, Mathematics • NIPS • 2012 It is shown that there are cases in which α = o(1/γ) but the problem is still solvable in polynomial time, and that this results naturally extend to the adversarial online learning model and to the PAC learning with malicious noise model. Expand A PTAS for Agnostically Learning Halfspaces We present a PTAS for agnostically learning halfspaces w.r.t. the uniform distribution on the$d$dimensional sphere. Namely, we show that for every$\mu>0$there is an algorithm that runs in timeExpand Reliably Learning the ReLU in Polynomial Time • Computer Science, Mathematics • COLT • 2017 A hypothesis is constructed that simultaneously minimizes the false-positive rate and the loss on inputs given positive labels by$\cal{D}$, for any convex, bounded, and Lipschitz loss function. Expand Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness • Computer Science, Mathematics • Electron. Colloquium Comput. Complex. • 2014 It is shown that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of non-log-concave distributions on the real line, including those with densities proportional to$\exp(-|x|^{0.99})$. Expand Embedding Hard Learning Problems into Gaussian Space • Computer Science, Mathematics • Electron. Colloquium Comput. Complex. • 2014 The first representation-independent hardness result for agnostically learning halfspaces with respect to the Gaussian distribution is given, showing the inherent diculty of designing supervised learning algorithms in Euclidean space even in the presence of strong distributional assumptions. Expand L1-regularized Neural Networks are Improperly Learnable in Polynomial Time • Mathematics, Computer Science • ICML • 2016 A kernel-based method, such that with probability at least 1 - δ, it learns a predictor whose generalization error is at most e worse than that of the neural network, implies that any sufficiently sparse neural network is learnable in polynomial time. Expand Learning Neural Networks with Two Nonlinear Layers in Polynomial Time • Computer Science • COLT • 2019 This work gives a polynomial-time algorithm for learning neural networks with one layer of sigmoids feeding into any Lipschitz, monotone activation function (e.g., sigmoid or ReLU), and suggests a new approach to Boolean learning problems via real-valued conditional-mean functions, sidestepping traditional hardness results from computational learning theory. Expand Efficient Learning of Linear Separators under Bounded Noise • Computer Science, Mathematics • COLT • 2015 This work provides the first evidence that one can indeed design algorithms achieving arbitrarily small excess error in polynomial time under this realistic noise model and thus opens up a new and exciting line of research. Expand #### References SHOWING 1-10 OF 40 REFERENCES Agnostically learning halfspaces • Mathematics, Computer Science • 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05) • 2005 We give the first algorithm that (under distributional assumptions) efficiently learns halfspaces in the notoriously difficult agnostic framework of Kearns, Schapire, & Sellie, where a learner isExpand Efficient Learning of Linear Perceptrons • Computer Science, Mathematics • NIPS • 2000 It is proved that unless P=NP, there is no algorithm that runs in time polynomial in the sample size and in 1/µ that is µ-margin successful for all µ > 0. Expand Hardness of Learning Halfspaces with Noise • Mathematics, Computer Science • FOCS • 2006 It is proved that even a tiny amount of worst-case noise makes the problem of learning halfspaces intractable in a strong sense, and a strong hardness is obtained for another basic computational problem: solving a linear system over the rationals. Expand Convexity, Classification, and Risk Bounds • Mathematics • 2006 Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convexExpand Polynomial regression under arbitrary product distributions • Mathematics, Computer Science • Machine Learning • 2010 A very simple proof that threshold functions over arbitrary product spaces have δ-noise sensitivity$O(\sqrt{\delta})\$, resolving an open problem suggested by Peres (2004). Expand
Fast rates for support vector machines using Gaussian kernels
• Mathematics
• 2007
For binary classification we establish learning rates up to the order of n −1 for support vector machines (SVMs) with hinge loss and Gaussian RBF kernels. These rates are in terms of two assumptionsExpand
On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization
• Computer Science, Mathematics
• NIPS
• 2008
This work characterizes the generalization ability of algorithms whose predictions are linear in the input vector. To this end, we provide sharp bounds for Rademacher and Gaussian complexities ofExpand
New Results for Learning Noisy Parities and Halfspaces
• Computer Science, Mathematics
• 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
• 2006
The first nontrivial algorithm for learning parities with adversarial noise is given, which shows that learning of DNF expressions reduces to learning noisy parities of just logarithmic number of variables and that majorities of halfspaces are hard to PAC-learn using any representation. Expand
Cryptographic Hardness for Learning Intersections of Halfspaces
• Mathematics, Computer Science
• 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
• 2006
The first representation-independent hardness results for PAC learning intersections of halfspaces are given, derived from two public-key cryptosystems due to Regev, which are based on the worst-case hardness of well-studied lattice problems. Expand
Statistical behavior and consistency of classification methods based on convex risk minimization
We study how closely the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by minimizing a convex upper bound of the classificationExpand