Corpus ID: 203734686

Pure and Spurious Critical Points: a Geometric Study of Linear Networks

  title={Pure and Spurious Critical Points: a Geometric Study of Linear Networks},
  author={Matthew Trager and Kathl{\'e}n Kohn and Joan Bruna},
  • Matthew Trager, Kathlén Kohn, Joan Bruna
  • Published 2020
  • Computer Science, Mathematics
  • ArXiv
  • The critical locus of the loss function of a neural network is determined by the geometry of the functional space and by the parameterization of this space by the network's weights. We introduce a natural distinction between pure critical points, which only depend on the functional space, and spurious critical points, which arise from the parameterization. We apply this perspective to revisit and extend the literature on the loss function of linear neural networks. For this type of network, the… CONTINUE READING
    3 Citations

    Figures, Tables, and Topics from this paper.

    Explore Further: Topics Discussed in This Paper

    Training Linear Neural Networks: Non-Local Convergence and Complexity Results
    • 3
    • PDF
    Learning deep linear neural networks: Riemannian gradient flows and convergence to global minimizers
    • 4
    • Highly Influenced
    • PDF
    On the Expressive Power of Deep Polynomial Neural Networks
    • 22
    • PDF


    The Loss Surfaces of Multilayer Networks
    • 742
    • PDF
    Critical Points of Neural Networks: Analytical Forms and Landscape Properties
    • Yi Zhou, Y. Liang
    • Computer Science, Mathematics
    • ArXiv
    • 2017
    • 39
    • PDF
    A Critical View of Global Optimality in Deep Learning
    • 30
    Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys
    • 59
    Spurious Valleys in Two-layer Neural Network Optimization Landscapes
    • 41
    • PDF
    The loss surface of deep linear networks viewed through the algebraic geometry lens
    • 11
    • PDF
    Small nonlinearities in activation functions create bad local minima in neural networks
    • 50
    • PDF
    Neural networks and principal component analysis: Learning from examples without local minima
    • 1,141
    • Highly Influential
    On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
    • 204
    • PDF