• Corpus ID: 207881069

Improving reinforcement learning algorithms: towards optimal learning rate policies

  title={Improving reinforcement learning algorithms: towards optimal learning rate policies},
  author={Othmane Mounjid and Charles-Albert Lehalle},
This paper investigates to what extent one can improve reinforcement learning algorithms. Our study is split in three parts. First, our analysis shows that the classical asymptotic convergence rate $O(1/\sqrt{N})$ is pessimistic and can be replaced by $O((\log(N)/N)^{\beta})$ with $\frac{1}{2}\leq \beta \leq 1$ and $N$ the number of iterations. Second, we propose a dynamic optimal policy for the choice of the learning rate $(\gamma_k)_{k\geq 0}$ used in stochastic approximation (SA). We… 

Figures and Tables from this paper

Learning a functional control for high-frequency finance
A deep neural network is used to generate controllers for optimal trading on high frequency data for the first time, and it is shown that the average distance between the generated controls and their explainable version remains small, opening the door to the acceptance of ML-generated controls by financial regulators.
Optimal Execution of Foreign Securities: A Double-Execution Problem with Signatures and Machine Learning
We employ the expected signature of equity and foreign exchange markets to derive an optimal double-execution trading strategy. The signature of a path of a stochastic process is a sequence of real


SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives
This work introduces a new optimisation method called SAGA, which improves on the theory behind SAG and SVRG, with better theoretical convergence rates, and has support for composite objectives where a proximal operator is used on the regulariser.
Deep Reinforcement Learning for Market Making in Corporate Bonds: Beating the Curse of Dimensionality
A discrete-time method inspired by reinforcement learning techniques, namely, a model-based deep actor-critic algorithm for approximating the optimal bid and ask quotes over a large universe of bonds in a model à la Avellaneda–Stoikov.
DGM: A deep learning algorithm for solving partial differential equations
High-dimensional PDEs have been a longstanding computational challenge. We propose a deep learning algorithm similar in spirit to Galerkin methods, using a deep neural network instead of linear
Relative deviation learning bounds and generalization with unbounded loss functions
An extensive analysis of relative deviation bounds is presented, including detailed proofs of two-sided inequalities and their implications and how to apply these results in a sample application: the analysis of importance weighting.
Optimal non-asymptotic bound of the Ruppert-Polyak averaging without strong convexity
This paper is devoted to the non-asymptotic control of the mean-squared error for the Ruppert-Polyak stochastic averaged gradient descent introduced in the seminal contributions of [Rup88] and
Limit Order Strategic Placement with Adverse Selection Risk and the Role of Latency
This paper is the first to make the connection between empirical evidences, a stochastic framework for limit orders including adverse selection, and the cost of latency, and is a first stone to shed light on the roles of latency and adverse selection for limit order placement, within an accurate stochastically control framework.
Mean field game of controls and an application to trade crowding
In this paper we formulate the now classical problem of optimal liquidation (or optimal trading) inside a mean field game (MFG). This is a noticeable change since usually mathematical frameworks
Algorithmic and High-Frequency Trading (Mathematics, Finance and Risk)
  • 2015
Stochastic Proximal Gradient Descent with Acceleration Techniques
This paper proposes and analyze an accelerated variant of these methods in the mini-batch setting that incorporates two acceleration techniques: one is Nesterov's acceleration method, and the other is a variance reduction for the stochastic gradient.
Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
It is proved that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG), but the analysis is significantly simpler and more intuitive.