• Corpus ID: 231861616

Agnostic Proper Learning of Halfspaces under Gaussian Marginals

@inproceedings{Diakonikolas2021AgnosticPL,
  title={Agnostic Proper Learning of Halfspaces under Gaussian Marginals},
  author={Ilias Diakonikolas and Daniel M. Kane and Vasilis Kontonis and Christos Tzamos and Nikos Zarifis},
  booktitle={Annual Conference Computational Learning Theory},
  year={2021}
}
We study the problem of agnostically learning halfspaces under the Gaussian distribution. Our main result is the first proper learning algorithm for this problem whose sample complexity and computational complexity qualitatively match those of the best known improper agnostic learner. Building on this result, we also obtain the first proper polynomial-time approximation scheme (PTAS) for agnostically learning homogeneous halfspaces. Our techniques naturally extend to agnostically learning… 

Figures from this paper

Testing distributional assumptions of learning algorithms

A model by which to systematically study the design of tester-learner pairs is proposed, such that if the distribution on examples in the data passes the tester T then one can safely trust the output of the agnostic learner A on the data.

Understanding Simultaneous Train and Test Robustness

This work shows that the two seemingly different notions of robustness at train-time and test-time are closely related, and this connection can be leveraged to develop algorithmic techniques that are applicable in both the settings.

Approximate Maximum Halfspace Discrepancy

A key technical result is a ε-approximate halfspace range counting data structure of size O(1/ε) with O(log( 1/ε)) query time, which can build in O(|X| + (1/ ε) log4(1 /ε)) time.

Learning general halfspaces with general Massart noise under the Gaussian distribution

The techniques rely on determining the existence (or non-existence) of low-degree polynomials whose expectations distinguish Massart halfspaces from random noise, and establish a qualitatively matching lower bound of dΩ(log(1/γ)) on the complexity of any Statistical Query (SQ) algorithm.

References

SHOWING 1-10 OF 37 REFERENCES

The Optimality of Polynomial Regression for Agnostic Learning under Gaussian Marginals

It is shown that the L-polynomial regression algorithm is essentially best possible among SQ algorithms, and therefore that the SQ complexity of agnostic learning is closely related to the polynomial degree required to approximate any function from the concept class in L-norm.

Agnostically learning halfspaces

We give the first algorithm that (under distributional assumptions) efficiently learns halfspaces in the notoriously difficult agnostic framework of Kearns, Schapire, & Sellie, where a learner is

Non-Convex SGD Learns Halfspaces with Adversarial Label Noise

For a broad family of structured distributions, including log-concave distributions, it is shown that non-convex SGD efficiently converges to a solution with misclassification error $O(\opt)+\eps$, where $\opt$ is the mis classification error of the best-fitting halfspace.

Toward efficient agnostic learning

An investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions is initiated, providing an initial outline of the possibilities for agnostic learning.

Learning geometric concepts with nasty noise

The first polynomial-time PAC learning algorithms for low-degree PTFs and intersections of halfspaces with dimension-independent error guarantees in the presence of nasty noise under the Gaussian distribution are given.

New Results for Learning Noisy Parities and Halfspaces

The first nontrivial algorithm for learning parities with adversarial noise is given, which shows that learning of DNF expressions reduces to learning noisy parities of just logarithmic number of variables and that majorities of halfspaces are hard to PAC-learn using any representation.

Complexity theoretic limitations on learning halfspaces

It is shown that no efficient learning algorithm has non-trivial worst-case performance even under the guarantees that Err_H(D) <= eta for arbitrarily small constant eta>0, and that D is supported in the Boolean cube.

Hardness of Learning Halfspaces with Noise

  • V. GuruswamiP. Raghavendra
  • Computer Science, Mathematics
    2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06)
  • 2006
It is proved that even a tiny amount of worst-case noise makes the problem of learning halfspaces intractable in a strong sense, and a strong hardness is obtained for another basic computational problem: solving a linear system over the rationals.

Approximation Schemes for ReLU Regression

The main insight is a new characterization of surrogate losses for nonconvex activations, showing that properties of the underlying distribution actually induce strong convexity for the loss, allowing us to relate the global minimum to the activation's Chow parameters.

Learning Halfspaces with Malicious Noise

New algorithms for learning halfspaces in the challenging malicious noise model can tolerate malicious noise rates exponentially larger than previous work in terms of the dependence on the dimension n, and succeed for the fairly broad class of all isotropic log-concave distributions.