• Corpus ID: 246706042

Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks

@article{Chen2022HardnessON,
  title={Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks},
  author={Sitan Chen and Aravind Gollakota and Adam R. Klivans and Raghu Meka},
  journal={ArXiv},
  year={2022},
  volume={abs/2202.05258}
}
We give superpolynomial statistical query (SQ) lower bounds for learning two-hidden-layer ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No general SQ lower bounds were known for learning ReLU networks of any depth in this setting: previous SQ lower bounds held only for adversarial noise models (agnostic learning) [KK14, GGK20, DKZ20] or restricted models such as correlational SQ [GGJ + 20, DKKZ20]. Prior work hinted at the impossibility of our result: Vempala… 
2 Citations

Figures from this paper

Training Fully Connected Neural Networks is ∃R-Complete
TLDR
The algorithmic problem of finding the optimal weights and biases for a two-layer fully connected neural network to a given set of data points is considered and it is shown that even very simple networks are difficult to train.
Training Fully Connected Neural Networks is $\exists\mathbb{R}$-Complete
TLDR
The algorithmic problem of finding the optimal weights and biases for a two-layer fully connected neural network to a given set of data points is considered and it is shown that even very simple networks are difficult to train.

References

SHOWING 1-10 OF 81 REFERENCES
Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks
TLDR
The first polynomial-time algorithm for this learning problem for PAC learning one-hidden-layer ReLU networks with arbitrary real coefficients is given, and a Statistical Query lower bound of d^{\Omega(k)$ is proved.
Learning Deep ReLU Networks Is Fixed-Parameter Tractable
TLDR
An algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters is given, which gives the first nontrivial results for networks of depth more than two.
Embedding Hard Learning Problems into Gaussian Space
TLDR
The first representation-independent hardness result for agnostically learning halfspaces with respect to the Gaussian distribution is given, showing the inherent diculty of designing supervised learning algorithms in Euclidean space even in the presence of strong distributional assumptions.
On the Complexity of Learning Neural Networks
TLDR
A comprehensive lower bound is demonstrated ruling out the possibility that data generated by neural networks with a single hidden layer, smooth activation functions and benign input distributions can be learned efficiently, and is robust to small perturbations of the true weights.
On the Cryptographic Hardness of Learning Single Periodic Neurons
TLDR
This work demonstrates the cryptographic hardness of learning a single periodic neuron over isotropic Gaussian distributions in the presence of noise and designs a polynomial-time algorithm for learning certain families of such functions under exponentially small adversarial noise.
Distribution-Specific Hardness of Learning Neural Networks
  • O. Shamir
  • Computer Science, Mathematics
    J. Mach. Learn. Res.
  • 2018
TLDR
This paper identifies a family of simple target functions, which are difficult to learn even if the input distribution is "nice", and provides evidence that neither class of assumptions alone is sufficient.
Continuous LWE
TLDR
A polynomial-time quantum reduction from worst-case lattice problems to CLWE is given, showing that CLWE enjoys similar hardness guarantees to those of LWE.
From average case complexity to improper learning complexity
TLDR
A new technique for proving hardness of improper learning, based on reductions from problems that are hard on average, is introduced, and a (fairly strong) generalization of Feige's assumption about the complexity of refuting random constraint satisfaction problems is put forward.
From Local Pseudorandom Generators to Hardness of Learning
TLDR
This work proves hardness-of-learning results under a well-studied assumption on the existence of local pseudorandom generators, which implies the hardness of virtually all improper PAC-learning problems (both distribution-free and distribution-specific) that were previously shown hard under other assumptions.
Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent
TLDR
It is proved that any classifier trained using gradient descent with respect to square-loss will fail to achieve small test error in polynomial time given access to samples labeled by a one-layer neural network.
...
...