• Corpus ID: 232068953

Sparse approximation in learning via neural ODEs

  title={Sparse approximation in learning via neural ODEs},
  author={Carlos Yag{\"u}e and Borjan Geshkovski},
We consider the continuous-time, neural ordinary differential equation (neural ODE) perspective of deep supervised learning, and study the impact of the final time horizon T in training. We focus on a cost consisting of an integral of the empirical risk over the time interval, and L–parameter regularization. Under homogeneity assumptions on the dynamics (typical for ReLU activations), we prove that any global minimizer is sparse, in the sense that there exists a positive stopping time T… 

Figures from this paper

Interpolation and approximation via Momentum ResNets and Neural ODEs
In this article, we explore the effects of memory terms in continuous-layer Deep Residual Networks by studying Neural ODEs (NODEs). We investigate two types of models. On one side, we consider the
A framework for randomized time-splitting in linear-quadratic optimal control
Inspired by the successes of stochastic algorithms in the training of deep neural networks and the simulation of interacting particle systems, we propose and analyze a framework for randomized
Optimal actuator design via Brunovsky's normal form
By using the Brunovsky normal form, this paper provides a reformulation of the problem consisting in finding the actuator design which minimizes the controllability cost for finite-dimensional linear systems with scalar controls and allows for an easy deduction of existence of solutions.
Neural ODE control for classification, approximation and transport
We analyze Neural Ordinary Differential Equations (NODEs) from a control theoretical perspective to address some of the main properties and paradigms of Deep Learning (DL), in particular, data
Turnpike in Lipschitz-nonlinear optimal control
This strategy combines the construction of suboptimal quasi-turnpike trajectories via controllability, and a bootstrap argument, and does not rely on analyzing the optimality system or linearization techniques, which allows it to address several optimal control problems for finite-dimensional, control-affine systems with globally Lipschitz nonlinearities.


On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport
It is shown that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers and involves Wasserstein gradient flows, a by-product of optimal transport theory.
Deep learning as optimal control problems: models and numerical methods
This work considers recent work of Haber and Ruthotto 2017 and Chang et al. 2018, where deep learning neural networks have been interpreted as discretisations of an optimal control problem subject to an ordinary differential equation constraint, and compares these deep learning algorithms numerically in terms of induced flow and generalisation ability.
Control On the Manifolds Of Mappings As a Setting For Deep Learning
The present contribution uses a control-theoretic setting to model the process of training (deep learning) of Artificial Neural Networks (ANN), which are aimed at solving classification problems, and results include examples of control systems, which are approximately controllable in the groups of diffeomorphisms.
Interpolation and approximation via Momentum ResNets and Neural ODEs
In this article, we explore the effects of memory terms in continuous-layer Deep Residual Networks by studying Neural ODEs (NODEs). We investigate two types of models. On one side, we consider the
Neural Ordinary Differential Equations
This work shows how to scalably backpropagate through any ODE solver, without access to its internal operations, which allows end-to-end training of ODEs within larger models.
On the Turnpike to Design of Deep Neural Nets: Explicit Depth Bounds
This paper proves explicit bounds on the required depths of DNNs based on asymptotic reachability assumptions and a dissipativityinducing choice of the regularization terms in the training problem.
Maximum Principle Based Algorithms for Deep Learning
The continuous dynamical system approach to deep learning is explored in order to devise alternative frameworks for training algorithms using the Pontryagin's maximum principle, demonstrating that it obtains favorable initial convergence rate per-iteration, provided Hamiltonian maximization can be efficiently carried out.
Continuous-in-Depth Neural Networks
This work shows that neural network models can learn to represent continuous dynamical systems, with this richer structure and properties, by embedding them into higher-order numerical integration schemes, such as the Runge Kutta schemes, and introduces ContinuousNet as a continuous-in-depth generalization of ResNet architectures.
Mean-field sparse optimal control
The technical derivation of the sparse mean- field optimal control is realized by the simultaneous development of the mean-field limit of the equations governing the followers dynamics together with the Γ-limit of the finite dimensional sparse optimal control problems.
Variational Networks: An Optimal Control Approach to Early Stopping Variational Methods for Image Restoration
A nonlinear spectral analysis of the gradient of the learned regularizer gives enlightening insights into the different regularization properties, and the development of first- and second-order conditions to verify optimal stopping time is developed.