• Corpus ID: 219531142

Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Connections to Evolvability

  title={Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Connections to Evolvability},
  author={Sitan Chen and Frederic Koehler and Ankur Moitra and Morris Yau},
In this paper we revisit some classic problems on classification under misspecification. In particular, we study the problem of learning halfspaces under Massart noise with rate $\eta$. In a recent work, Diakonikolas, Goulekakis, and Tzamos resolved a long-standing problem by giving the first efficient algorithm for learning to accuracy $\eta + \epsilon$ for any $\epsilon > 0$. However, their algorithm outputs a complicated hypothesis, which partitions space into $\text{poly}(d,1/\epsilon… 

Figures from this paper

Hardness of Learning Halfspaces with Massart Noise

There is an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a polynomial-time SQ algorithm, and this lower bound implies that no efficient SQ algorithm can approximate the optimal error within anyPolynomial factor.

Robust Learning under Strong Noise via SQs

This work provides several new insights on the robustness of Kearns' statistical query framework against challenging label-noise models, and shows that every SQ learnable class admits an efficient learning algorithm with OPT + $\epsilon$ misclassification error for a broad class of noise models.

Agnostic Learnability of Halfspaces via Logistic Loss

A well-behaved distribution is constructed such that the global minimizer of the logistic risk over this distribution only achieves Ω ( cid:0) √ OPT (cid:1) misclassification risk, matching the upper bound in (Frei et al., 2021b).

A Polynomial Time Algorithm for Learning Halfspaces with Tsybakov Noise

The first polynomial-time certificate algorithm for PAC learning homogeneous halfspaces in the presence of Tsybakov noise is given, which learns the true halfspace within any desired accuracy $\epsilon$ and succeeds under a broad family of well-behaved distributions including log-concave distributions.

Learning general halfspaces with general Massart noise under the Gaussian distribution

The techniques rely on determining the existence (or non-existence) of low-degree polynomials whose expectations distinguish Massart halfspaces from random noise, and establish a qualitatively matching lower bound of dΩ(log(1/γ)) on the complexity of any Statistical Query (SQ) algorithm.

Learning Halfspaces with Pairwise Comparisons: Breaking the Barriers of Query Complexity via Crowd Wisdom

This paper shows that even when both labels and comparisons are corrupted by Massart noise, there is a polynomial-time algorithm that provably learns the underlying halfspace with near-optimal query complexity and noise tolerance, under the distribution-independent setting.

Near-Optimal Statistical Query Hardness of Learning Halfspaces with Massart Noise

It is shown that no efficient SQ algorithm for learning Massart halfspaces on R d can achieve error better than Ω( η ) , even if OPT = 2 − log c ( d ) , for any universal constant c ∈ (0, 1) .

Cryptographic Hardness of Learning Halfspaces with Massart Noise

The main result is the first computational hardness result for this learning problem, which essentially resolves the polynomial PAC learnability of Massart halfspaces, by showing that known efficient learning algorithms for the problem are nearly best possible.

Improved Algorithms for Efficient Active Learning Halfspaces with Massart and Tsybakov noise

A computationally-efficient PAC active learning algorithm for d-dimensional homogeneous halfspaces that can tolerate Massart noise and Tsybakov noise and identifies two subfamilies of noise conditions, under which the efficient algorithm provides label complexity guarantees strictly lower than passive learning algorithms.

Efficiently learning halfspaces with Tsybakov noise

The first polynomial-time algorithm for this fundamental learning problem of PAC learning homogeneous halfspaces with Tsybakov noise is given, which learns the true halfspace within any desired accuracy and succeeds under a broad family of well-behaved distributions including log-concave distributions.



Complexity theoretic limitations on learning halfspaces

It is shown that no efficient learning algorithm has non-trivial worst-case performance even under the guarantees that Err_H(D) <= eta for arbitrarily small constant eta>0, and that D is supported in the Boolean cube.

Efficient Learning of Linear Separators under Bounded Noise

This work provides the first evidence that one can indeed design algorithms achieving arbitrarily small excess error in polynomial time under this realistic noise model and thus opens up a new and exciting line of research.

Agnostically learning halfspaces

We give the first algorithm that (under distributional assumptions) efficiently learns halfspaces in the notoriously difficult agnostic framework of Kearns, Schapire, & Sellie, where a learner is

Revisiting Perceptron: Efficient and Label-Optimal Learning of Halfspaces

An efficient Perceptron-based algorithm for actively learning homogeneous halfspaces under the uniform distribution over the unit sphere is proposed and is converted to an efficient passive learning algorithm that has near-optimal sample complexities with respect to $\epsilon$ and $d.

Distribution-Independent PAC Learning of Halfspaces with Massart Noise

No efficient weak (distribution-independent) learner was known in this model, even for the class of disjunctions, so there is evidence that improving on the error guarantee of the algorithm might be computationally hard.

Polynomial regression under arbitrary product distributions

A very simple proof that threshold functions over arbitrary product spaces have δ-noise sensitivity $O(\sqrt{\delta})$, resolving an open problem suggested by Peres (2004).

Learning Halfspaces with Massart Noise Under Structured Distributions

This work identifies a smooth {\em non-convex} surrogate loss with the property that any approximate stationary point of this loss defines a halfspace that is close to the target halfspace, and can be used to solve the underlying learning problem.

A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions

It is shown how simple greedy methods can be used to find weak hypotheses (hypotheses that correctly classify noticeably more than half of the examples) in polynomial time, without dependence on any separation parameter.

Distribution-Independent Evolvability of Linear Threshold Functions

This paper presents a proof that linear threshold functions having a nonnegligible margin on the data points are evolvable distribution-independently via a simple mutation algorithm and shows that the answer is negative.

On Basing Lower-Bounds for Learning on Worst-Case Assumptions

It is proved that if alanguage L reduces to the task of improper learning of circuits, then, depending on the type of the reduction in use, either L has a statistical zero-knowledge argument system, or the worst-case hardness of L implies the existence of a weak variant of one-way functions defined by Ostrovsky-Wigderson (ISTCS '93).