# Improving reinforcement learning algorithms: towards optimal learning rate policies

@article{Mounjid2019ImprovingRL, title={Improving reinforcement learning algorithms: towards optimal learning rate policies}, author={Othmane Mounjid and Charles-Albert Lehalle}, journal={ArXiv}, year={2019}, volume={abs/1911.02319} }

This paper investigates to what extent one can improve reinforcement learning algorithms. Our study is split in three parts. First, our analysis shows that the classical asymptotic convergence rate $O(1/\sqrt{N})$ is pessimistic and can be replaced by $O((\log(N)/N)^{\beta})$ with $\frac{1}{2}\leq \beta \leq 1$ and $N$ the number of iterations. Second, we propose a dynamic optimal policy for the choice of the learning rate $(\gamma_k)_{k\geq 0}$ used in stochastic approximation (SA). We…

## 3 Citations

Learning a functional control for high-frequency finance

- Computer Science, MathematicsArXiv
- 2020

A deep neural network is used to generate controllers for optimal trading on high frequency data for the first time, and it is shown that the average distance between the generated controls and their explainable version remains small, opening the door to the acceptance of ML-generated controls by financial regulators.

Optimal Execution of Foreign Securities: A Double-Execution Problem with Signatures and Machine Learning

- Business
- 2020

We employ the expected signature of equity and foreign exchange markets to derive an optimal double-execution trading strategy. The signature of a path of a stochastic process is a sequence of real…

## References

SHOWING 1-10 OF 38 REFERENCES

SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives

- Computer Science, MathematicsNIPS
- 2014

This work introduces a new optimisation method called SAGA, which improves on the theory behind SAG and SVRG, with better theoretical convergence rates, and has support for composite objectives where a proximal operator is used on the regulariser.

Deep Reinforcement Learning for Market Making in Corporate Bonds: Beating the Curse of Dimensionality

- Economics, Computer ScienceApplied Mathematical Finance
- 2019

A discrete-time method inspired by reinforcement learning techniques, namely, a model-based deep actor-critic algorithm for approximating the optimal bid and ask quotes over a large universe of bonds in a model à la Avellaneda–Stoikov.

DGM: A deep learning algorithm for solving partial differential equations

- Mathematics, EconomicsJournal of Computational Physics
- 2018

High-dimensional PDEs have been a longstanding computational challenge. We propose a deep learning algorithm similar in spirit to Galerkin methods, using a deep neural network instead of linear…

Relative deviation learning bounds and generalization with unbounded loss functions

- Mathematics, Computer ScienceAnnals of Mathematics and Artificial Intelligence
- 2018

An extensive analysis of relative deviation bounds is presented, including detailed proofs of two-sided inequalities and their implications and how to apply these results in a sample application: the analysis of importance weighting.

Optimal non-asymptotic bound of the Ruppert-Polyak averaging without strong convexity

- Mathematics
- 2017

This paper is devoted to the non-asymptotic control of the mean-squared error for the Ruppert-Polyak stochastic averaged gradient descent introduced in the seminal contributions of [Rup88] and…

Limit Order Strategic Placement with Adverse Selection Risk and the Role of Latency

- Computer Science, Economics
- 2016

This paper is the first to make the connection between empirical evidences, a stochastic framework for limit orders including adverse selection, and the cost of latency, and is a first stone to shed light on the roles of latency and adverse selection for limit order placement, within an accurate stochastically control framework.

Mean field game of controls and an application to trade crowding

- Economics, Mathematics
- 2016

In this paper we formulate the now classical problem of optimal liquidation (or optimal trading) inside a mean field game (MFG). This is a noticeable change since usually mathematical frameworks…

Algorithmic and High-Frequency Trading (Mathematics, Finance and Risk)

- 2015

Stochastic Proximal Gradient Descent with Acceleration Techniques

- Computer Science, MathematicsNIPS
- 2014

This paper proposes and analyze an accelerated variant of these methods in the mini-batch setting that incorporates two acceleration techniques: one is Nesterov's acceleration method, and the other is a variance reduction for the stochastic gradient.

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

- Computer Science, MathematicsNIPS
- 2013

It is proved that this method enjoys the same fast convergence rate as those of stochastic dual coordinate ascent (SDCA) and Stochastic Average Gradient (SAG), but the analysis is significantly simpler and more intuitive.