• Corpus ID: 238743895

The Convex Geometry of Backpropagation: Neural Network Gradient Flows Converge to Extreme Points of the Dual Convex Program

@article{Wang2021TheCG,
  title={The Convex Geometry of Backpropagation: Neural Network Gradient Flows Converge to Extreme Points of the Dual Convex Program},
  author={Yifei Wang and Mert Pilanci},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.06488}
}
We study non-convex subgradient flows for training two-layer ReLU neural networks from a convex geometry and duality perspective. We characterize the implicit bias of unregularized non-convex gradient flow as convex regularization of an equivalent convex model. We then show that the limit points of non-convex subgradient flows can be identified via primal-dual correspondence in this convex optimization problem. Moreover, we derive a sufficient condition on the dual variables which ensures that… 

Figures from this paper

References

SHOWING 1-10 OF 25 REFERENCES
Convex Geometry and Duality of Over-parameterized Neural Networks
TLDR
A convex analytic framework for ReLU neural networks is developed which elucidates the inner workings of hidden neurons and their function space characteristics and establishes a connection to $\ell_0$-$\ell_1$ equivalence for neural networks analogous to the minimal cardinality solutions in compressed sensing.
Neural Spectrahedra and Semidefinite Lifts: Global Convex Optimization of Polynomial Activation Neural Networks in Fully Polynomial-Time
TLDR
This paper develops exact convex optimization formulations for two-layer neural networks with second degree polynomial activations based on semidefinite programming and extends the results beyond the fully connected architecture to different neural network architectures including networks with vector outputs and convolutional architectures with pooling.
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
TLDR
The implicit regularization of the gradient descent algorithm in homogeneous neural networks, including fully-connected and convolutional neural networks with ReLU or LeakyReLU activations, is studied, and it is proved that both the normalized margin and its smoothed version converge to the objective value at a KKT point of the optimization problem.
Global Optimality Beyond Two Layers: Training Deep ReLU Networks via Convex Programs
TLDR
It is proved that the equivalent convex problem can be globally optimized by a standard convex optimization solver with a polynomial-time complexity with respect to the number of samples and data dimension when the width of the network is fixed.
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
TLDR
It is shown that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers and involves Wasserstein gradient flows, a by-product of optimal transport theory.
Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks
TLDR
It is shown that ReLU networks trained with standard weight decay are equivalent to block $\ell_1$ penalized convex models and certain standard convolutional linear networks are equivalent semi-definite programs which can be simplified to regularized linear models in a polynomial sized discrete Fourier feature space.
Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions
TLDR
This work analyzes the training of Wasserstein GANs with two-layer neural network discriminators through the lens of convex duality, and for a variety of generators expose the conditions under which Wassersteins can be solved exactly with convex optimization approaches, or can be represented as convex-concave games.
A mean field view of the landscape of two-layer neural networks
TLDR
A compact description of the SGD dynamics is derived in terms of a limiting partial differential equation that allows for “averaging out” some of the complexities of the landscape of neural networks and can be used to prove a general convergence result for noisy SGD.
Training Quantized Neural Networks to Global Optimality via Semidefinite Programming
TLDR
Surprisingly, it is shown that certain quantized NN problems can be solved to global optimality provably in polynomial time in all relevant parameters via tight semidefinite relaxations.
The Implicit Bias of Gradient Descent on Separable Data
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the
...
1
2
3
...