# Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Connections to Evolvability

@article{Chen2020ClassificationUM, title={Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Connections to Evolvability}, author={Sitan Chen and Frederic Koehler and Ankur Moitra and Morris Yau}, journal={ArXiv}, year={2020}, volume={abs/2006.04787} }

In this paper we revisit some classic problems on classification under misspecification. In particular, we study the problem of learning halfspaces under Massart noise with rate $\eta$. In a recent work, Diakonikolas, Goulekakis, and Tzamos resolved a long-standing problem by giving the first efficient algorithm for learning to accuracy $\eta + \epsilon$ for any $\epsilon > 0$. However, their algorithm outputs a complicated hypothesis, which partitions space into $\text{poly}(d,1/\epsilon…

## 15 Citations

### Hardness of Learning Halfspaces with Massart Noise

- Computer ScienceArXiv
- 2020

There is an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a polynomial-time SQ algorithm, and this lower bound implies that no efficient SQ algorithm can approximate the optimal error within anyPolynomial factor.

### Robust Learning under Strong Noise via SQs

- Computer Science, MathematicsAISTATS
- 2021

This work provides several new insights on the robustness of Kearns' statistical query framework against challenging label-noise models, and shows that every SQ learnable class admits an efficient learning algorithm with OPT + $\epsilon$ misclassification error for a broad class of noise models.

### Agnostic Learnability of Halfspaces via Logistic Loss

- Mathematics, Computer ScienceICML
- 2022

A well-behaved distribution is constructed such that the global minimizer of the logistic risk over this distribution only achieves Ω ( cid:0) √ OPT (cid:1) misclassiﬁcation risk, matching the upper bound in (Frei et al., 2021b).

### A Polynomial Time Algorithm for Learning Halfspaces with Tsybakov Noise

- Computer ScienceArXiv
- 2020

The first polynomial-time certificate algorithm for PAC learning homogeneous halfspaces in the presence of Tsybakov noise is given, which learns the true halfspace within any desired accuracy $\epsilon$ and succeeds under a broad family of well-behaved distributions including log-concave distributions.

### Learning general halfspaces with general Massart noise under the Gaussian distribution

- Computer Science, MathematicsSTOC
- 2022

The techniques rely on determining the existence (or non-existence) of low-degree polynomials whose expectations distinguish Massart halfspaces from random noise, and establish a qualitatively matching lower bound of dΩ(log(1/γ)) on the complexity of any Statistical Query (SQ) algorithm.

### Learning Halfspaces with Pairwise Comparisons: Breaking the Barriers of Query Complexity via Crowd Wisdom

- Computer ScienceArXiv
- 2020

This paper shows that even when both labels and comparisons are corrupted by Massart noise, there is a polynomial-time algorithm that provably learns the underlying halfspace with near-optimal query complexity and noise tolerance, under the distribution-independent setting.

### Near-Optimal Statistical Query Hardness of Learning Halfspaces with Massart Noise

- Computer ScienceCOLT
- 2022

It is shown that no efﬁcient SQ algorithm for learning Massart halfspaces on R d can achieve error better than Ω( η ) , even if OPT = 2 − log c ( d ) , for any universal constant c ∈ (0, 1) .

### Cryptographic Hardness of Learning Halfspaces with Massart Noise

- Computer ScienceArXiv
- 2022

The main result is the first computational hardness result for this learning problem, which essentially resolves the polynomial PAC learnability of Massart halfspaces, by showing that known eﬃcient learning algorithms for the problem are nearly best possible.

### Improved Algorithms for Efficient Active Learning Halfspaces with Massart and Tsybakov noise

- Computer ScienceCOLT
- 2021

A computationally-efficient PAC active learning algorithm for d-dimensional homogeneous halfspaces that can tolerate Massart noise and Tsybakov noise and identifies two subfamilies of noise conditions, under which the efficient algorithm provides label complexity guarantees strictly lower than passive learning algorithms.

### Efficiently learning halfspaces with Tsybakov noise

- Computer Science, MathematicsSTOC
- 2021

The first polynomial-time algorithm for this fundamental learning problem of PAC learning homogeneous halfspaces with Tsybakov noise is given, which learns the true halfspace within any desired accuracy and succeeds under a broad family of well-behaved distributions including log-concave distributions.

## References

SHOWING 1-10 OF 65 REFERENCES

### Complexity theoretic limitations on learning halfspaces

- Computer Science, MathematicsSTOC
- 2016

It is shown that no efficient learning algorithm has non-trivial worst-case performance even under the guarantees that Err_H(D) <= eta for arbitrarily small constant eta>0, and that D is supported in the Boolean cube.

### Efficient Learning of Linear Separators under Bounded Noise

- Computer ScienceCOLT
- 2015

This work provides the first evidence that one can indeed design algorithms achieving arbitrarily small excess error in polynomial time under this realistic noise model and thus opens up a new and exciting line of research.

### Agnostically learning halfspaces

- Computer Science46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05)
- 2005

We give the first algorithm that (under distributional assumptions) efficiently learns halfspaces in the notoriously difficult agnostic framework of Kearns, Schapire, & Sellie, where a learner is…

### Revisiting Perceptron: Efficient and Label-Optimal Learning of Halfspaces

- Computer ScienceNIPS
- 2017

An efficient Perceptron-based algorithm for actively learning homogeneous halfspaces under the uniform distribution over the unit sphere is proposed and is converted to an efficient passive learning algorithm that has near-optimal sample complexities with respect to $\epsilon$ and $d.

### Distribution-Independent PAC Learning of Halfspaces with Massart Noise

- Computer ScienceNeurIPS
- 2019

No efficient weak (distribution-independent) learner was known in this model, even for the class of disjunctions, so there is evidence that improving on the error guarantee of the algorithm might be computationally hard.

### Polynomial regression under arbitrary product distributions

- Computer Science, MathematicsMachine Learning
- 2010

A very simple proof that threshold functions over arbitrary product spaces have δ-noise sensitivity $O(\sqrt{\delta})$, resolving an open problem suggested by Peres (2004).

### Learning Halfspaces with Massart Noise Under Structured Distributions

- Computer Science, MathematicsCOLT 2020
- 2020

This work identifies a smooth {\em non-convex} surrogate loss with the property that any approximate stationary point of this loss defines a halfspace that is close to the target halfspace, and can be used to solve the underlying learning problem.

### A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions

- Computer Science, MathematicsAlgorithmica
- 1998

It is shown how simple greedy methods can be used to find weak hypotheses (hypotheses that correctly classify noticeably more than half of the examples) in polynomial time, without dependence on any separation parameter.

### Distribution-Independent Evolvability of Linear Threshold Functions

- Computer Science, MathematicsCOLT
- 2011

This paper presents a proof that linear threshold functions having a nonnegligible margin on the data points are evolvable distribution-independently via a simple mutation algorithm and shows that the answer is negative.

### On Basing Lower-Bounds for Learning on Worst-Case Assumptions

- Computer Science2008 49th Annual IEEE Symposium on Foundations of Computer Science
- 2008

It is proved that if alanguage L reduces to the task of improper learning of circuits, then, depending on the type of the reduction in use, either L has a statistical zero-knowledge argument system, or the worst-case hardness of L implies the existence of a weak variant of one-way functions defined by Ostrovsky-Wigderson (ISTCS '93).