On the Linear Convergence of Natural Policy Gradient Algorithm
- Computer Science2021 60th IEEE Conference on Decision and Control (CDC)
Improved finite time convergence bounds are presented, and it is shown that the Natural Policy Gradient, which forms the basis of several popular RL algorithms, has geometric (also known as linear) asymptotic convergence rate.
SHOWING 1-10 OF 351 REFERENCES
Essentials of Stochastic Processes
Basic concepts Additive processes Stationary processes Markov processes Diffusion Postscript.
Theory of point estimation
This paper presents a meta-analyses of large-sample theory and its applications in the context of discrete-time reinforcement learning, which aims to clarify the role of reinforcement learning in the reinforcement-gauging process.
Efficient and Adaptive Estimation for Semiparametric Models
Introduction.- Asymptotic Inference for (Finite-Dimensional) Parametric Models.- Information Bounds for Euclidean Parameters in Infinite-Dimensional Models.- Euclidean Parameters: Further Examples.-…
Computational Implications of Reducing Data to Sufficient Statistics
- Computer ScienceArXiv
It is shown that reducing data to sufficient statistics can change a computationally tractable estimation problem into an intractable one.
Structure adaptive approach for dimension reduction
We propose a new method of effective dimension reduction for a multiindex model which is based on iterative improvement of the family of average derivative estimates. The procedure is computationally…
Semiparametric and Nonparametric Methods in Econometrics
- Mathematics, Economics
Single-Index Models.- Nonparametric Additive Models and Semiparametric Partially Linear Models.- Binary-Response Models.- Statistical Inverse Problems.- Transformation Models.
Mathematical Foundations of Infinite-Dimensional Statistical Models
- Mathematics, Computer Science
This chapter discusses nonparametric statistical models, function spaces and approximation theory, and the minimax paradigm, which aims to provide a model for adaptive inference oflihood-based procedures.
Acceleration of stochastic approximation by averaging
- Computer Science
Convergence with probability one is proved for a variety of classical optimization and identification problems and it is demonstrated for these problems that the proposed algorithm achieves the highest possible rate of convergence.
High-Dimensional Probability: An Introduction with Applications in Data Science
© 2018, Cambridge University Press Let us summarize our findings. A random projection of a set T in R n onto an m-dimensional subspace approximately preserves the geometry of T if m ⪆ d ( T ) . For...
Optimal detection of sparse principal components in high dimension
- Computer Science
The minimax optimal test is based on a sparse eigenvalue statistic, and a computationally efficient alternative test using convex relaxations is described, which is proved to detect sparse principal components at near optimal detection levels and performs well on simulated datasets.