Deep linear neural networks with arbitrary loss: All local minima are global


We consider deep linear networks with arbitrary differentiable loss. We provide a short and elementary proof of the following fact: all local minima are global minima if each hidden layer is wider than either the input or output layer. 


  • Presentations referencing similar topics