Big Neural Networks Waste Capacity

  title={Big Neural Networks Waste Capacity},
  author={Yann Dauphin and Yoshua Bengio},
This article exposes the failure of some big neural networks to leverage added capacity to reduce underfitting. Past research suggest diminishing returns when increasing the size of neural networks. Our experiments on ImageNet LSVRC2010 show that this may be due to the fact there are highly diminishing returns for capacity in terms of training error, leading to underfitting. This suggests that the optimization method first order gradient descent fails at this regime. Directly attacking this… CONTINUE READING
Highly Cited
This paper has 45 citations. REVIEW CITATIONS