• Corpus ID: 221739228

# A priori guarantees of finite-time convergence for Deep Neural Networks

@article{Rankawat2020APG,
title={A priori guarantees of finite-time convergence for Deep Neural Networks},
author={Anushree Rankawat and Mansi Rankawat and Harshal B. Oza},
journal={ArXiv},
year={2020},
volume={abs/2009.07509}
}
• Published 16 September 2020
• Computer Science, Mathematics
• ArXiv
In this paper, we perform Lyapunov based analysis of the loss function to derive an a priori upper bound on the settling time of deep neural networks. While previous studies have attempted to understand deep learning using control theory framework, there is limited work on a priori finite time convergence analysis. Drawing from the advances in analysis of finite-time control of non-linear systems, we provide a priori guarantees of finite-time convergence in a deterministic control theoretic…

## References

SHOWING 1-10 OF 23 REFERENCES

• Computer Science, Mathematics
ArXiv
• 2019
A novel looking-ahead viewpoint renders finite-time analysis of biased SA algorithms under a large family of stochastic perturbations possible, and proves a general result on the convergence of the iterates, and uses it to derive non-asymptotic bounds on the mean-square error in the case of constant stepsizes.
• Computer Science
ArXiv
• 2019
This article provides one possible way to align existing branches of deep learning theory through the lens of dynamical system and optimal control and provides a principled way for hyper-parameter tuning when optimal control theory is introduced.
• Computer Science
ICLR
• 2019
The speed of convergence to global optimum for gradient descent training a deep linear neural network is analyzed by minimizing the $\ell_2$ loss over whitened data by maximizing the initial loss of any rank-deficient solution.
• Computer Science
ICLR
• 2014
It is shown that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions.
• Computer Science
NIPS
• 2017
A convergence analysis for SGD is provided on a rich subset of two-layer feedforward networks with ReLU activations characterized by a special structure called "identity mapping" that proves that, if input follows from Gaussian distribution, with standard $O(1/\sqrt{d})$ initialization of the weights, SGD converges to the global minimum in polynomial number of steps.
• Mathematics
COLT
• 2020
The bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise, only the constants are affected by the mixing time of the Markov chain.
• Engineering
IEEE Transactions on Neural Networks and Learning Systems
• 2014
A novel adaptive-critic-based neural network (NN) controller is investigated for nonlinear pure-feedback systems and a deterministic learning technique has been employed to guarantee that the partial persistent excitation condition of internal states is satisfied during tracking control to a periodic reference orbit.
• Mathematics
IEEE Trans. Neural Networks
• 2002
The recurrent neural network with implicit dynamics is deliberately developed in the way that its trajectory is guaranteed to converge exponentially to the time-varying solution of a given Sylvester equation.
• Computer Science
NeurIPS
• 2019
It is shown when the number of neurons is sufficiently large, meaning polynomial in the training data size and in thelinear convergence rate, then SGD is capable of minimizing the regression loss in the linear convergence rate and gives theoretical evidence of how RNNs can memorize data.