Learning Kernel-Based Halfspaces with the 0-1 Loss

@article{ShalevShwartz2011LearningKH,
  title={Learning Kernel-Based Halfspaces with the 0-1 Loss},
  author={S. Shalev-Shwartz and O. Shamir and Karthik Sridharan},
  journal={SIAM J. Comput.},
  year={2011},
  volume={40},
  pages={1623-1646}
}
We describe and analyze a new algorithm for agnostically learning kernel-based halfspaces with respect to the 0-1 loss function. Unlike most of the previous formulations, which rely on surrogate convex loss functions (e.g., hinge-loss in support vector machines (SVMs) and log-loss in logistic regression), we provide finite time/sample guarantees with respect to the more natural 0-1 loss function. The proposed algorithm can learn kernel-based halfspaces in worst-case time poly$(\exp(L\log(L… Expand
The Complexity of Learning Halfspaces using Generalized Linear Methods
TLDR
The main result shows that the approximation ratio of every efficient algorithm from this family must be $\ge \Omega\left(\frac{1/\gamma}{\mathrm{poly}\left(\log\left(1/ \gamma\right)\right)}\right)$, essentially matching the best known upper bound. Expand
Learning Halfspaces and Neural Networks with Random Initialization
TLDR
It is shown that if the data is separable by some neural network with constant margin $\gamma>0$, then there is a polynomial-time algorithm for learning a neural network that separates the training data with margin $\Omega(\gamma)$. Expand
Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs
TLDR
It is shown that there are cases in which α = o(1/γ) but the problem is still solvable in polynomial time, and that this results naturally extend to the adversarial online learning model and to the PAC learning with malicious noise model. Expand
A PTAS for Agnostically Learning Halfspaces
We present a PTAS for agnostically learning halfspaces w.r.t. the uniform distribution on the $d$ dimensional sphere. Namely, we show that for every $\mu>0$ there is an algorithm that runs in timeExpand
Reliably Learning the ReLU in Polynomial Time
TLDR
A hypothesis is constructed that simultaneously minimizes the false-positive rate and the loss on inputs given positive labels by $\cal{D}$, for any convex, bounded, and Lipschitz loss function. Expand
Weighted Polynomial Approximations: Limits for Learning and Pseudorandomness
  • Mark Bun, T. Steinke
  • Computer Science, Mathematics
  • Electron. Colloquium Comput. Complex.
  • 2014
TLDR
It is shown that polynomials of any degree cannot approximate the sign function to within arbitrarily low error for a large class of non-log-concave distributions on the real line, including those with densities proportional to $\exp(-|x|^{0.99})$. Expand
Embedding Hard Learning Problems into Gaussian Space
TLDR
The first representation-independent hardness result for agnostically learning halfspaces with respect to the Gaussian distribution is given, showing the inherent diculty of designing supervised learning algorithms in Euclidean space even in the presence of strong distributional assumptions. Expand
L1-regularized Neural Networks are Improperly Learnable in Polynomial Time
TLDR
A kernel-based method, such that with probability at least 1 - δ, it learns a predictor whose generalization error is at most e worse than that of the neural network, implies that any sufficiently sparse neural network is learnable in polynomial time. Expand
Learning Neural Networks with Two Nonlinear Layers in Polynomial Time
TLDR
This work gives a polynomial-time algorithm for learning neural networks with one layer of sigmoids feeding into any Lipschitz, monotone activation function (e.g., sigmoid or ReLU), and suggests a new approach to Boolean learning problems via real-valued conditional-mean functions, sidestepping traditional hardness results from computational learning theory. Expand
Efficient Learning of Linear Separators under Bounded Noise
TLDR
This work provides the first evidence that one can indeed design algorithms achieving arbitrarily small excess error in polynomial time under this realistic noise model and thus opens up a new and exciting line of research. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 40 REFERENCES
Agnostically learning halfspaces
We give the first algorithm that (under distributional assumptions) efficiently learns halfspaces in the notoriously difficult agnostic framework of Kearns, Schapire, & Sellie, where a learner isExpand
Efficient Learning of Linear Perceptrons
TLDR
It is proved that unless P=NP, there is no algorithm that runs in time polynomial in the sample size and in 1/µ that is µ-margin successful for all µ > 0. Expand
Hardness of Learning Halfspaces with Noise
TLDR
It is proved that even a tiny amount of worst-case noise makes the problem of learning halfspaces intractable in a strong sense, and a strong hardness is obtained for another basic computational problem: solving a linear system over the rationals. Expand
Convexity, Classification, and Risk Bounds
Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convexExpand
Polynomial regression under arbitrary product distributions
TLDR
A very simple proof that threshold functions over arbitrary product spaces have δ-noise sensitivity $O(\sqrt{\delta})$, resolving an open problem suggested by Peres (2004). Expand
Fast rates for support vector machines using Gaussian kernels
For binary classification we establish learning rates up to the order of n −1 for support vector machines (SVMs) with hinge loss and Gaussian RBF kernels. These rates are in terms of two assumptionsExpand
On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization
This work characterizes the generalization ability of algorithms whose predictions are linear in the input vector. To this end, we provide sharp bounds for Rademacher and Gaussian complexities ofExpand
New Results for Learning Noisy Parities and Halfspaces
TLDR
The first nontrivial algorithm for learning parities with adversarial noise is given, which shows that learning of DNF expressions reduces to learning noisy parities of just logarithmic number of variables and that majorities of halfspaces are hard to PAC-learn using any representation. Expand
Cryptographic Hardness for Learning Intersections of Halfspaces
TLDR
The first representation-independent hardness results for PAC learning intersections of halfspaces are given, derived from two public-key cryptosystems due to Regev, which are based on the worst-case hardness of well-studied lattice problems. Expand
Statistical behavior and consistency of classification methods based on convex risk minimization
We study how closely the optimal Bayes error rate can be approximately reached using a classification algorithm that computes a classifier by minimizing a convex upper bound of the classificationExpand
...
1
2
3
4
...