• Corpus ID: 219531142

# Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Connections to Evolvability

@article{Chen2020ClassificationUM,
title={Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Connections to Evolvability},
author={Sitan Chen and Frederic Koehler and Ankur Moitra and Morris Yau},
journal={ArXiv},
year={2020},
volume={abs/2006.04787}
}
• Published 8 June 2020
• Computer Science
• ArXiv
In this paper we revisit some classic problems on classification under misspecification. In particular, we study the problem of learning halfspaces under Massart noise with rate $\eta$. In a recent work, Diakonikolas, Goulekakis, and Tzamos resolved a long-standing problem by giving the first efficient algorithm for learning to accuracy $\eta + \epsilon$ for any $\epsilon > 0$. However, their algorithm outputs a complicated hypothesis, which partitions space into $\text{poly}(d,1/\epsilon… ## Figures from this paper • Computer Science ArXiv • 2020 There is an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a polynomial-time SQ algorithm, and this lower bound implies that no efficient SQ algorithm can approximate the optimal error within anyPolynomial factor. • Computer Science, Mathematics AISTATS • 2021 This work provides several new insights on the robustness of Kearns' statistical query framework against challenging label-noise models, and shows that every SQ learnable class admits an efficient learning algorithm with OPT +$\epsilon$misclassification error for a broad class of noise models. • Mathematics, Computer Science ICML • 2022 A well-behaved distribution is constructed such that the global minimizer of the logistic risk over this distribution only achieves Ω ( cid:0) √ OPT (cid:1) misclassiﬁcation risk, matching the upper bound in (Frei et al., 2021b). • Computer Science ArXiv • 2020 The first polynomial-time certificate algorithm for PAC learning homogeneous halfspaces in the presence of Tsybakov noise is given, which learns the true halfspace within any desired accuracy$\epsilon$and succeeds under a broad family of well-behaved distributions including log-concave distributions. • Computer Science, Mathematics STOC • 2022 The techniques rely on determining the existence (or non-existence) of low-degree polynomials whose expectations distinguish Massart halfspaces from random noise, and establish a qualitatively matching lower bound of dΩ(log(1/γ)) on the complexity of any Statistical Query (SQ) algorithm. • Computer Science ArXiv • 2020 This paper shows that even when both labels and comparisons are corrupted by Massart noise, there is a polynomial-time algorithm that provably learns the underlying halfspace with near-optimal query complexity and noise tolerance, under the distribution-independent setting. • Computer Science COLT • 2022 It is shown that no efﬁcient SQ algorithm for learning Massart halfspaces on R d can achieve error better than Ω( η ) , even if OPT = 2 − log c ( d ) , for any universal constant c ∈ (0, 1) . • Computer Science ArXiv • 2022 The main result is the first computational hardness result for this learning problem, which essentially resolves the polynomial PAC learnability of Massart halfspaces, by showing that known eﬃcient learning algorithms for the problem are nearly best possible. • Computer Science COLT • 2021 A computationally-efficient PAC active learning algorithm for d-dimensional homogeneous halfspaces that can tolerate Massart noise and Tsybakov noise and identifies two subfamilies of noise conditions, under which the efficient algorithm provides label complexity guarantees strictly lower than passive learning algorithms. • Computer Science, Mathematics STOC • 2021 The first polynomial-time algorithm for this fundamental learning problem of PAC learning homogeneous halfspaces with Tsybakov noise is given, which learns the true halfspace within any desired accuracy and succeeds under a broad family of well-behaved distributions including log-concave distributions. ## References SHOWING 1-10 OF 65 REFERENCES It is shown that no efficient learning algorithm has non-trivial worst-case performance even under the guarantees that Err_H(D) <= eta for arbitrarily small constant eta>0, and that D is supported in the Boolean cube. • Computer Science COLT • 2015 This work provides the first evidence that one can indeed design algorithms achieving arbitrarily small excess error in polynomial time under this realistic noise model and thus opens up a new and exciting line of research. • Computer Science 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05) • 2005 We give the first algorithm that (under distributional assumptions) efficiently learns halfspaces in the notoriously difficult agnostic framework of Kearns, Schapire, & Sellie, where a learner is • Computer Science NIPS • 2017 An efficient Perceptron-based algorithm for actively learning homogeneous halfspaces under the uniform distribution over the unit sphere is proposed and is converted to an efficient passive learning algorithm that has near-optimal sample complexities with respect to$\epsilon$and$d.
• Computer Science
NeurIPS
• 2019
No efficient weak (distribution-independent) learner was known in this model, even for the class of disjunctions, so there is evidence that improving on the error guarantee of the algorithm might be computationally hard.
• Computer Science, Mathematics
Machine Learning
• 2010
A very simple proof that threshold functions over arbitrary product spaces have δ-noise sensitivity $O(\sqrt{\delta})$, resolving an open problem suggested by Peres (2004).
• Computer Science, Mathematics
COLT 2020
• 2020
This work identifies a smooth {\em non-convex} surrogate loss with the property that any approximate stationary point of this loss defines a halfspace that is close to the target halfspace, and can be used to solve the underlying learning problem.
• Computer Science, Mathematics
Algorithmica
• 1998
It is shown how simple greedy methods can be used to find weak hypotheses (hypotheses that correctly classify noticeably more than half of the examples) in polynomial time, without dependence on any separation parameter.
• Computer Science
2008 49th Annual IEEE Symposium on Foundations of Computer Science
• 2008
It is proved that if alanguage L reduces to the task of improper learning of circuits, then, depending on the type of the reduction in use, either L has a statistical zero-knowledge argument system, or the worst-case hardness of L implies the existence of a weak variant of one-way functions defined by Ostrovsky-Wigderson (ISTCS '93).
It is shown that evolvability is equivalent to learnability by a restricted form of statistical queries, and it is proved that for any fixed distribution D over the instance space, every class of functions learnable by SQs over D is evolvable over D.