Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks
@article{Chen2022HardnessON, title={Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks}, author={Sitan Chen and Aravind Gollakota and Adam R. Klivans and Raghu Meka}, journal={ArXiv}, year={2022}, volume={abs/2202.05258} }
We give superpolynomial statistical query (SQ) lower bounds for learning two-hidden-layer ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No general SQ lower bounds were known for learning ReLU networks of any depth in this setting: previous SQ lower bounds held only for adversarial noise models (agnostic learning) [KK14, GGK20, DKZ20] or restricted models such as correlational SQ [GGJ + 20, DKKZ20]. Prior work hinted at the impossibility of our result: Vempala…
Figures from this paper
2 Citations
Training Fully Connected Neural Networks is ∃R-Complete
- Computer ScienceArXiv
- 2022
The algorithmic problem of finding the optimal weights and biases for a two-layer fully connected neural network to a given set of data points is considered and it is shown that even very simple networks are difficult to train.
Training Fully Connected Neural Networks is $\exists\mathbb{R}$-Complete
- Computer Science
- 2022
The algorithmic problem of finding the optimal weights and biases for a two-layer fully connected neural network to a given set of data points is considered and it is shown that even very simple networks are difficult to train.
References
SHOWING 1-10 OF 81 REFERENCES
Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks
- Computer ScienceCOLT
- 2020
The first polynomial-time algorithm for this learning problem for PAC learning one-hidden-layer ReLU networks with arbitrary real coefficients is given, and a Statistical Query lower bound of d^{\Omega(k)$ is proved.
Learning Deep ReLU Networks Is Fixed-Parameter Tractable
- Computer Science, Mathematics2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS)
- 2022
An algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters is given, which gives the first nontrivial results for networks of depth more than two.
Embedding Hard Learning Problems into Gaussian Space
- Computer ScienceElectron. Colloquium Comput. Complex.
- 2014
The first representation-independent hardness result for agnostically learning halfspaces with respect to the Gaussian distribution is given, showing the inherent diculty of designing supervised learning algorithms in Euclidean space even in the presence of strong distributional assumptions.
On the Complexity of Learning Neural Networks
- Computer ScienceNIPS
- 2017
A comprehensive lower bound is demonstrated ruling out the possibility that data generated by neural networks with a single hidden layer, smooth activation functions and benign input distributions can be learned efficiently, and is robust to small perturbations of the true weights.
On the Cryptographic Hardness of Learning Single Periodic Neurons
- Computer Science, MathematicsNeurIPS
- 2021
This work demonstrates the cryptographic hardness of learning a single periodic neuron over isotropic Gaussian distributions in the presence of noise and designs a polynomial-time algorithm for learning certain families of such functions under exponentially small adversarial noise.
Distribution-Specific Hardness of Learning Neural Networks
- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2018
This paper identifies a family of simple target functions, which are difficult to learn even if the input distribution is "nice", and provides evidence that neither class of assumptions alone is sufficient.
Continuous LWE
- Computer Science, MathematicsElectron. Colloquium Comput. Complex.
- 2020
A polynomial-time quantum reduction from worst-case lattice problems to CLWE is given, showing that CLWE enjoys similar hardness guarantees to those of LWE.
From average case complexity to improper learning complexity
- Computer ScienceSTOC
- 2014
A new technique for proving hardness of improper learning, based on reductions from problems that are hard on average, is introduced, and a (fairly strong) generalization of Feige's assumption about the complexity of refuting random constraint satisfaction problems is put forward.
From Local Pseudorandom Generators to Hardness of Learning
- Computer Science, MathematicsCOLT
- 2021
This work proves hardness-of-learning results under a well-studied assumption on the existence of local pseudorandom generators, which implies the hardness of virtually all improper PAC-learning problems (both distribution-free and distribution-specific) that were previously shown hard under other assumptions.
Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent
- Computer ScienceICML
- 2020
It is proved that any classifier trained using gradient descent with respect to square-loss will fail to achieve small test error in polynomial time given access to samples labeled by a one-layer neural network.