Selection dynamics for deep neural networks

  title={Selection dynamics for deep neural networks},
  author={Hailiang Liu and Peter A. Markowich},
  journal={arXiv: Analysis of PDEs},
This paper presents a partial differential equation framework for deep residual neural networks and for the associated learning problem. This is done by carrying out the continuum limits of neural networks with respect to width and depth. We study the wellposedness, the large time solution behavior, and the characterization of the steady states of the forward problem. Several useful time-uniform estimates and stability/instability conditions are presented. We state and prove optimality… Expand
Mean-field Langevin System, Optimal Control and Deep Neural Networks
A system of mean-field Langevin equations, the invariant measure of which is shown to be the optimal control of the initial problem under mild conditions, is introduced and endorses the solvability of the stochastic gradient descent algorithm for a wide class of deep neural networks. Expand
On the space-time expressivity of ResNets.
It is shown that by increasing the number of residual blocks as well as their expressivity the solution of an arbitrary ODE can be approximated in space and time simultaneously by deep ReLU ResNets. Expand
Game on Random Environement, Mean-field Langevin System and Neural Networks
It is proved that the marginal laws of the corresponding mean-field Langevin systems can converge towards the games' equilibria in different settings, which can apply to analysing the stochastic gradient descent algorithm for deep neural networks in the context of supervised learning as well as for the generative adversarial networks. Expand
Data-driven optimal control of a seir model for COVID-19
This paper investigates a basic Susceptible-Exposed-Infectious-Recovered (SEIR) model and calibrate the model parameters to the reported data, and provides efficient numerical algorithms based on a generalized Pontryagin Maximum Principle associated with the optimal control theory. Expand
Deep Neural Networks, Generic Universal Interpolation, and Controlled ODEs
It is shown that universal interpolation holds for certain deep neural networks even if large numbers of parameters are left untrained, and are instead chosen randomly, which lends theoretical support to the observation that training with random initialization can be successful even when most parameters are largely unchanged through the training. Expand
Large-time asymptotics in deep learning
It is shown that, in long time, the optimal states converge to zero training error, and this result provides an alternative theoretical underpinning to the notion that neural networks learn best in the overparametrized regime, when seen from the large layer perspective. Expand
Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework
A Pontryagin differentiable programming methodology is developed, which establishes a unified framework to solve a broad class of learning and control tasks and investigates three learning modes of the PDP: inverse reinforcement learning, system identification, and control/planning, respectively. Expand
Deep Learning via Dynamical Systems: An Approximation Perspective
The results reveal that composition function approximation through flow maps present a new paradigm in approximation theory and contributes to building a useful mathematical framework to investigate deep learning. Expand
Mean-Field Neural ODEs via Relaxed Optimal Control
We develop a framework for the analysis of deep neural networks and neural ODE models that are trained with stochastic gradient algorithms. We do that by identifying the connections between controlExpand


A mean-field optimal control formulation of deep learning
This paper introduces the mathematical formulation of the population risk minimization problem in deep learning as a mean-field optimal control problem, and state and prove optimality conditions of both the Hamilton–Jacobi–Bellman type and the Pontryagin type. Expand
An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks
The discrete-time method of successive approximations (MSA), which is based on the Pontryagin's maximum principle, is introduced for training neural networks with weights that are constrained to take values in a discrete set. Expand
Maximum Principle Based Algorithms for Deep Learning
The continuous dynamical system approach to deep learning is explored in order to devise alternative frameworks for training algorithms using the Pontryagin's maximum principle, demonstrating that it obtains favorable initial convergence rate per-iteration, provided Hamiltonian maximization can be efficiently carried out. Expand
Deep Residual Learning and PDEs on Manifold
The deep residual network (ResNet) is formulated as a control problem of transport equation as the transport equation is solved along the characteristics and several models based on transport equation and Hamilton-Jacobi equation are proposed. Expand
Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations
It is shown that many effective networks, such as ResNet, PolyNet, FractalNet and RevNet, can be interpreted as different numerical discretizations of differential equations and established a connection between stochastic control and noise injection in the training process which helps to improve generalization of the networks. Expand
Deep Neural Networks Motivated by Partial Differential Equations
A new PDE interpretation of a class of deep convolutional neural networks (CNN) that are commonly used to learn from speech, image, and video data is established and three new ResNet architectures are derived that fall into two new classes: parabolic and hyperbolic CNNs. Expand
Deep Neural Network Approximation Theory
Deep networks provide exponential approximation accuracy—i.e., the approximation error decays exponentially in the number of nonzero weights in the network— of the multiplication operation, polynomials, sinusoidal functions, and certain smooth functions. Expand
Generalization of Back propagation to Recurrent and Higher Order Neural Networks
A general method for deriving backpropagation algorithms for networks with recurrent and higher order networks and to a constrained dynamical system for training a content addressable memory. Expand
Reversible Architectures for Arbitrarily Deep Residual Neural Networks
From this interpretation, a theoretical framework on stability and reversibility of deep neural networks is developed, and three reversible neural network architectures that can go arbitrarily deep in theory are derived. Expand
A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks
The speed of convergence to global optimum for gradient descent training a deep linear neural network is analyzed by minimizing the $\ell_2$ loss over whitened data by maximizing the initial loss of any rank-deficient solution. Expand