Deep linear neural networks with arbitrary loss: All local minima are global

Abstract

We consider deep linear networks with arbitrary differentiable loss. We provide a short and elementary proof of the following fact: all local minima are global minima if each hidden layer is wider than either the input or output layer. 

Topics

  • Presentations referencing similar topics