# Sparse approximation in learning via neural ODEs

@article{Yage2021SparseAI, title={Sparse approximation in learning via neural ODEs}, author={Carlos Yag{\"u}e and Borjan Geshkovski}, journal={ArXiv}, year={2021}, volume={abs/2102.13566} }

We consider the continuous-time, neural ordinary differential equation (neural ODE) perspective of deep supervised learning, and study the impact of the final time horizon T in training. We focus on a cost consisting of an integral of the empirical risk over the time interval, and L–parameter regularization. Under homogeneity assumptions on the dynamics (typical for ReLU activations), we prove that any global minimizer is sparse, in the sense that there exists a positive stopping time T…

## 5 Citations

Interpolation and approximation via Momentum ResNets and Neural ODEs

- Mathematics
- 2021

In this article, we explore the effects of memory terms in continuous-layer Deep Residual Networks by studying Neural ODEs (NODEs). We investigate two types of models. On one side, we consider the…

A framework for randomized time-splitting in linear-quadratic optimal control

- Mathematics
- 2021

Inspired by the successes of stochastic algorithms in the training of deep neural networks and the simulation of interacting particle systems, we propose and analyze a framework for randomized…

Optimal actuator design via Brunovsky's normal form

- Computer Science, MathematicsArXiv
- 2021

By using the Brunovsky normal form, this paper provides a reformulation of the problem consisting in finding the actuator design which minimizes the controllability cost for finite-dimensional linear systems with scalar controls and allows for an easy deduction of existence of solutions.

Neural ODE control for classification, approximation and transport

- Mathematics
- 2021

We analyze Neural Ordinary Differential Equations (NODEs) from a control theoretical perspective to address some of the main properties and paradigms of Deep Learning (DL), in particular, data…

Turnpike in Lipschitz-nonlinear optimal control

- Computer Science, MathematicsArXiv
- 2020

This strategy combines the construction of suboptimal quasi-turnpike trajectories via controllability, and a bootstrap argument, and does not rely on analyzing the optimality system or linearization techniques, which allows it to address several optimal control problems for finite-dimensional, control-affine systems with globally Lipschitz nonlinearities.

## References

SHOWING 1-10 OF 64 REFERENCES

On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

- Mathematics, Computer ScienceNeurIPS
- 2018

It is shown that, when initialized correctly and in the many-particle limit, this gradient flow, although non-convex, converges to global minimizers and involves Wasserstein gradient flows, a by-product of optimal transport theory.

Deep learning as optimal control problems: models and numerical methods

- Computer Science, MathematicsJournal of Computational Dynamics
- 2019

This work considers recent work of Haber and Ruthotto 2017 and Chang et al. 2018, where deep learning neural networks have been interpreted as discretisations of an optimal control problem subject to an ordinary differential equation constraint, and compares these deep learning algorithms numerically in terms of induced flow and generalisation ability.

Control On the Manifolds Of Mappings As a Setting For Deep Learning

- Mathematics, Computer ScienceArXiv
- 2020

The present contribution uses a control-theoretic setting to model the process of training (deep learning) of Artificial Neural Networks (ANN), which are aimed at solving classification problems, and results include examples of control systems, which are approximately controllable in the groups of diffeomorphisms.

Interpolation and approximation via Momentum ResNets and Neural ODEs

- Mathematics
- 2021

In this article, we explore the effects of memory terms in continuous-layer Deep Residual Networks by studying Neural ODEs (NODEs). We investigate two types of models. On one side, we consider the…

Neural Ordinary Differential Equations

- Computer Science, MathematicsNeurIPS
- 2018

This work shows how to scalably backpropagate through any ODE solver, without access to its internal operations, which allows end-to-end training of ODEs within larger models.

On the Turnpike to Design of Deep Neural Nets: Explicit Depth Bounds

- Computer Science, EngineeringArXiv
- 2021

This paper proves explicit bounds on the required depths of DNNs based on asymptotic reachability assumptions and a dissipativityinducing choice of the regularization terms in the training problem.

Maximum Principle Based Algorithms for Deep Learning

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2017

The continuous dynamical system approach to deep learning is explored in order to devise alternative frameworks for training algorithms using the Pontryagin's maximum principle, demonstrating that it obtains favorable initial convergence rate per-iteration, provided Hamiltonian maximization can be efficiently carried out.

Continuous-in-Depth Neural Networks

- Computer Science, MathematicsArXiv
- 2020

This work shows that neural network models can learn to represent continuous dynamical systems, with this richer structure and properties, by embedding them into higher-order numerical integration schemes, such as the Runge Kutta schemes, and introduces ContinuousNet as a continuous-in-depth generalization of ResNet architectures.

Mean-field sparse optimal control

- Mathematics, MedicinePhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
- 2014

The technical derivation of the sparse mean- field optimal control is realized by the simultaneous development of the mean-field limit of the equations governing the followers dynamics together with the Γ-limit of the finite dimensional sparse optimal control problems.

Variational Networks: An Optimal Control Approach to Early Stopping Variational Methods for Image Restoration

- Computer Science, MathematicsJournal of Mathematical Imaging and Vision
- 2020

A nonlinear spectral analysis of the gradient of the learned regularizer gives enlightening insights into the different regularization properties, and the development of first- and second-order conditions to verify optimal stopping time is developed.