# Learning Kernel-Based Halfspaces with the 0-1 Loss

@article{ShalevShwartz2011LearningKH, title={Learning Kernel-Based Halfspaces with the 0-1 Loss}, author={S. Shalev-Shwartz and O. Shamir and Karthik Sridharan}, journal={SIAM J. Comput.}, year={2011}, volume={40}, pages={1623-1646} }

We describe and analyze a new algorithm for agnostically learning kernel-based halfspaces with respect to the 0-1 loss function. Unlike most of the previous formulations, which rely on surrogate convex loss functions (e.g., hinge-loss in support vector machines (SVMs) and log-loss in logistic regression), we provide finite time/sample guarantees with respect to the more natural 0-1 loss function. The proposed algorithm can learn kernel-based halfspaces in worst-case time poly$(\exp(L\log(L… Expand

#### Figures and Topics from this paper

#### 49 Citations

The Complexity of Learning Halfspaces using Generalized Linear Methods

- Mathematics, Computer Science
- COLT
- 2014

The main result shows that the approximation ratio of every efficient algorithm from this family must be $\ge \Omega\left(\frac{1/\gamma}{\mathrm{poly}\left(\log\left(1/ \gamma\right)\right)}\right)$, essentially matching the best known upper bound. Expand

Learning Halfspaces and Neural Networks with Random Initialization

- Mathematics, Computer Science
- ArXiv
- 2015

It is shown that if the data is separable by some neural network with constant margin $\gamma>0$, then there is a polynomial-time algorithm for learning a neural network that separates the training data with margin $\Omega(\gamma)$. Expand

Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs

- Computer Science, Mathematics
- NIPS
- 2012

It is shown that there are cases in which α = o(1/γ) but the problem is still solvable in polynomial time, and that this results naturally extend to the adversarial online learning model and to the PAC learning with malicious noise model. Expand

A PTAS for Agnostically Learning Halfspaces

- Mathematics, Computer Science
- COLT
- 2015

We present a PTAS for agnostically learning halfspaces w.r.t. the uniform distribution on the $d$ dimensional sphere. Namely, we show that for every $\mu>0$ there is an algorithm that runs in time… Expand

Reliably Learning the ReLU in Polynomial Time

- Computer Science, Mathematics
- COLT
- 2017

A hypothesis is constructed that simultaneously minimizes the false-positive rate and the loss on inputs given positive labels by $\cal{D}$, for any convex, bounded, and Lipschitz loss function. Expand

Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness

- Computer Science, Mathematics
- Electron. Colloquium Comput. Complex.
- 2014

It is shown that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of non-log-concave distributions on the real line, including those with densities proportional to $\exp(-|x|^{0.99})$. Expand

Embedding Hard Learning Problems into Gaussian Space

- Computer Science, Mathematics
- Electron. Colloquium Comput. Complex.
- 2014

The first representation-independent hardness result for agnostically learning halfspaces with respect to the Gaussian distribution is given, showing the inherent diculty of designing supervised learning algorithms in Euclidean space even in the presence of strong distributional assumptions. Expand

L1-regularized Neural Networks are Improperly Learnable in Polynomial Time

- Mathematics, Computer Science
- ICML
- 2016

A kernel-based method, such that with probability at least 1 - δ, it learns a predictor whose generalization error is at most e worse than that of the neural network, implies that any sufficiently sparse neural network is learnable in polynomial time. Expand

Learning Neural Networks with Two Nonlinear Layers in Polynomial Time

- Computer Science
- COLT
- 2019

This work gives a polynomial-time algorithm for learning neural networks with one layer of sigmoids feeding into any Lipschitz, monotone activation function (e.g., sigmoid or ReLU), and suggests a new approach to Boolean learning problems via real-valued conditional-mean functions, sidestepping traditional hardness results from computational learning theory. Expand

Efficient Learning of Linear Separators under Bounded Noise

- Computer Science, Mathematics
- COLT
- 2015

This work provides the first evidence that one can indeed design algorithms achieving arbitrarily small excess error in polynomial time under this realistic noise model and thus opens up a new and exciting line of research. Expand

#### References

SHOWING 1-10 OF 40 REFERENCES

Agnostically learning halfspaces

- Mathematics, Computer Science
- 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05)
- 2005

We give the first algorithm that (under distributional assumptions) efficiently learns halfspaces in the notoriously difficult agnostic framework of Kearns, Schapire, & Sellie, where a learner is… Expand

Efficient Learning of Linear Perceptrons

- Computer Science, Mathematics
- NIPS
- 2000

It is proved that unless P=NP, there is no algorithm that runs in time polynomial in the sample size and in 1/µ that is µ-margin successful for all µ > 0. Expand

Hardness of Learning Halfspaces with Noise

- Mathematics, Computer Science
- FOCS
- 2006

It is proved that even a tiny amount of worst-case noise makes the problem of learning halfspaces intractable in a strong sense, and a strong hardness is obtained for another basic computational problem: solving a linear system over the rationals. Expand

Convexity, Classification, and Risk Bounds

- Mathematics
- 2006

Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex… Expand

Polynomial regression under arbitrary product distributions

- Mathematics, Computer Science
- Machine Learning
- 2010

A very simple proof that threshold functions over arbitrary product spaces have δ-noise sensitivity $O(\sqrt{\delta})$, resolving an open problem suggested by Peres (2004). Expand

Fast rates for support vector machines using Gaussian kernels

- Mathematics
- 2007

For binary classification we establish learning rates up to the order of n −1 for support vector machines (SVMs) with hinge loss and Gaussian RBF kernels. These rates are in terms of two assumptions… Expand

On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization

- Computer Science, Mathematics
- NIPS
- 2008

This work characterizes the generalization ability of algorithms whose predictions are linear in the input vector. To this end, we provide sharp bounds for Rademacher and Gaussian complexities of… Expand

New Results for Learning Noisy Parities and Halfspaces

- Computer Science, Mathematics
- 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
- 2006

The first nontrivial algorithm for learning parities with adversarial noise is given, which shows that learning of DNF expressions reduces to learning noisy parities of just logarithmic number of variables and that majorities of halfspaces are hard to PAC-learn using any representation. Expand

Cryptographic Hardness for Learning Intersections of Halfspaces

- Mathematics, Computer Science
- 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
- 2006

The first representation-independent hardness results for PAC learning intersections of halfspaces are given, derived from two public-key cryptosystems due to Regev, which are based on the worst-case hardness of well-studied lattice problems. Expand

Statistical behavior and consistency of classification methods based on convex risk minimization

- Mathematics
- 2003

We study how closely the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by minimizing a convex upper bound of the classification… Expand