• Corpus ID: 15749446

Reinforcement Learning for Trading

  title={Reinforcement Learning for Trading},
  author={John E. Moody and Matthew Saffell},
We propose to train trading systems by optimizing financial objective functions via reinforcement learning. [] Key Result We provide new simulation results that demonstrate the presence of predictability in the monthly S&P 500 Stock Index for the 25 year period 1970 through 1994, as well as a sensitivity analysis that provides economic insight into the trader's structure.

Figures from this paper

Algorithm Trading using Q-Learning and Recurrent Reinforcement Learning
This paper uses classic reinforcement algorithm, Q-learning, to evaluate the performance in terms of cumulative profits by maximizing different forms of value functions: interval profit, sharp ratio, and derivative sharp ratio and finds that this direct reinforcement learning framework enables a simpler problem representation than that in value function based search algorithm.
Contracts for Difference: A Reinforcement Learning Approach
It is proved that reinforcement learning agents with recurrent long short-term memory (LSTM) networks can learn from recent market history and outperform the market and an increased model size may compensate for a higher latency.
Reinforcement Learning for Systematic FX Trading
This work explores online inductive transfer learning, with a feature representation transfer from a radial basis function network formed of Gaussian mixture model hidden processing units to a direct, recurrent reinforcement learning agent, which achieves an annualised portfolio information ratio of 0.52 and targets a risk position in an online transfer learning context.
Deep reinforcement learning for portfolio management
The experimental results show that the model is able to optimize investment decisions and has the ability to obtain excess return in stock market, and the optimized agent maintains the asset weights at fixed value throughout the trading periods and trades at a very low transaction cost rate.
Market Making via Reinforcement Learning
A high-fidelity simulation of limit order book markets is developed, and a market making agent using temporal-difference reinforcement learning is designed using a linear combination of tile codings as a value function approximator and a custom reward function that controls inventory risk.
Quantitative Trading through Random Perturbation Q-Network with Nonlinear Transaction Costs
A novel RL trading algorithm utilizing random perturbation of the Q-network and account for the more realistic nonlinear transaction costs is proposed and used to make trading decisions based on the daily stock prices of Apple, Meta, and Bitcoin and demonstrates its strengths over other quantitative trading methods.
Adaptive Quantitative Trading: An Imitative Deep Reinforcement Learning Approach
An adaptive trading model, namely iRDPG, is proposed, to automatically develop QT strategies by an intelligent trading agent and is enhanced by deep reinforcement learning (DRL) and imitation learning techniques.
Deep Reinforcement Learning for Active High Frequency Trading
It is argued that the DRL agents are able to create a dynamic representation of the underlying environment highlighting the occasional regularities present in the data and exploiting them to create long-term profitable trading strategies.
Factor Representation and Decision Making in Stock Markets Using Deep Reinforcement Learning
A portfolio management system using direct deep reinforcement learning to make optimal portfolio choice periodically among S&P500 underlying stocks by learning a good factor representation by leveraging the powerful representation of deep neural networks.
Deep Reinforcement Learning for Pairs Trading Georgia Institute of Technology
This work applied model-free deep reinforcement learning (DRL) in stock markets to train a pairs trading agent with the goal of maximizing long-term income, albeit possibly at the expense of short-term gain.


Optimization of trading systems and portfolios
  • J. Moody, Lizhong Wu
  • Economics, Computer Science
    Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr)
  • 1997
It is found that maximizing the differential Sharpe ratio yields more consistent results than maximizing profits, and that both methods outperform a trading system based on forecasts that minimize MSE.
Performance functions and reinforcement learning for trading systems and portfolios
We propose to train trading systems and portfolios by optimizing objective functions that directly measure trading and investment performance. Rather than basing a trading system on forecasts or
Optimal Asset Allocation using Adaptive Dynamic Programming
Asset allocation is formalized as a Markovian Decision Problem which can be optimized by applying dynamic programming or reinforcement learning based algorithms and is shown to be equivalent to a policy computed by dynamic programming.
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks
The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporal
Improving Elevator Performance Using Reinforcement Learning
Results in simulation surpass the best of the heuristic elevator control algorithms of which the author is aware and demonstrate the power of RL on a very large scale stochastic dynamic optimization problem of practical utility.
High-Performance Job-Shop Scheduling With A Time-Delay TD-lambda Network
This paper shows how to extend the time-delay neural network (TDNN) architecture to apply it to irregular-length schedules and shows that this TDNN-TD() network can match the performance of the previous hand-engineered system.
Neurogammon Wins Computer Olympiad
Neurogammon 1.0 is a backgammon program which uses multilayer neural networks to make move decisions and doubling decisions and won the First Computer Olympiad in London with a perfect record of five wins and no losses.
Decision Technologies for Financial Engineering
This volume selects the best contributions from the Fourth International Conference on Neural Networks in the Capital Markets (NNCM). The conference brought together academics from several
Neurogammon wins the computer olympiad', Neural Computation 1,321-323
  • 1989
Optimal asset allocation using adaptive dynamic programmingAdvances in NIPS
  • Optimal asset allocation using adaptive dynamic programmingAdvances in NIPS
  • 1996