• Corpus ID: 221739228

A priori guarantees of finite-time convergence for Deep Neural Networks

@article{Rankawat2020APG,
  title={A priori guarantees of finite-time convergence for Deep Neural Networks},
  author={Anushree Rankawat and Mansi Rankawat and Harshal B. Oza},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.07509}
}
In this paper, we perform Lyapunov based analysis of the loss function to derive an a priori upper bound on the settling time of deep neural networks. While previous studies have attempted to understand deep learning using control theory framework, there is limited work on a priori finite time convergence analysis. Drawing from the advances in analysis of finite-time control of non-linear systems, we provide a priori guarantees of finite-time convergence in a deterministic control theoretic… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 23 REFERENCES

A Multistep Lyapunov Approach for Finite-Time Analysis of Biased Stochastic Approximation

A novel looking-ahead viewpoint renders finite-time analysis of biased SA algorithms under a large family of stochastic perturbations possible, and proves a general result on the convergence of the iterates, and uses it to derive non-asymptotic bounds on the mean-square error in the case of constant stepsizes.

Deep Learning Theory Review: An Optimal Control and Dynamical Systems Perspective

This article provides one possible way to align existing branches of deep learning theory through the lens of dynamical system and optimal control and provides a principled way for hyper-parameter tuning when optimal control theory is introduced.

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

The speed of convergence to global optimum for gradient descent training a deep linear neural network is analyzed by minimizing the $\ell_2$ loss over whitened data by maximizing the initial loss of any rank-deficient solution.

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

It is shown that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions.

Convergence Analysis of Two-layer Neural Networks with ReLU Activation

A convergence analysis for SGD is provided on a rich subset of two-layer feedforward networks with ReLU activations characterized by a special structure called "identity mapping" that proves that, if input follows from Gaussian distribution, with standard $O(1/\sqrt{d})$ initialization of the weights, SGD converges to the global minimum in polynomial number of steps.

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

The bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise, only the constants are affected by the mixing time of the Markov chain.

Reinforcement Learning Output Feedback NN Control Using Deterministic Learning Technique

A novel adaptive-critic-based neural network (NN) controller is investigated for nonlinear pure-feedback systems and a deterministic learning technique has been employed to guarantee that the partial persistent excitation condition of internal states is satisfied during tracking control to a periodic reference orbit.

A recurrent neural network for solving Sylvester equation with time-varying coefficients

The recurrent neural network with implicit dynamics is deliberately developed in the way that its trajectory is guaranteed to converge exponentially to the time-varying solution of a given Sylvester equation.

On the Convergence Rate of Training Recurrent Neural Networks

It is shown when the number of neurons is sufficiently large, meaning polynomial in the training data size and in thelinear convergence rate, then SGD is capable of minimizing the regression loss in the linear convergence rate and gives theoretical evidence of how RNNs can memorize data.