Deep Reinforcement Learning for Stock Portfolio Optimization

  title={Deep Reinforcement Learning for Stock Portfolio Optimization},
  author={Le Trung Hieu},
  • L. T. Hieu
  • Published 1 October 2020
  • Computer Science
  • ArXiv
Stock portfolio optimization is the process of constant re-distribution of money to a pool of various stocks. In this paper, we will formulate the problem such that we can apply Reinforcement Learning for the task properly. To maintain a realistic assumption about the market, we will incorporate transaction cost and risk factor into the state as well. On top of that, we will apply various state-of-the-art Deep Reinforcement Learning algorithms for comparison. Since the action space is… 

Figures from this paper

Balancing Profit, Risk, and Sustainability for Portfolio Management
  • Charl Maree, C. Omlin
  • Computer Science
    2022 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics (CIFEr)
  • 2022
This work developed a novel utility function with the Sharpe ratio representing risk and the environmental, social, and governance score (ESG) representing sustainability representing sustainability, and proposes a system that does not merely report these metrics, but that actively optimizes the portfolio to improve on them.


A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem
A financial-model-free Reinforcement Learning framework to provide a deep machine learning solution to the portfolio management problem, able to achieve at least 4-fold returns in 50 days.
Indicator selection for daily equity trading with recurrent reinforcement learning
It is found that the trading performance concerning the out-of sample daily Sharpe ratios turns better: the number of companies with a positive and significant Sharpe ratio increases after feeding the selected indicators jointly with prices information into the RRL system.
Performance functions and reinforcement learning for trading systems and portfolios
We propose to train trading systems and portfolios by optimizing objective functions that directly measure trading and investment performance. Rather than basing a trading system on forecasts or
Generalized deterministic policy gradient algorithms
To overcome the challenge of high sample complexity of DPG in this setting, the Generalized Deterministic Policy Gradient algorithm is proposed, to optimize a weighted objective of the original Markov decision process and an augmented MDP that simplifies the original one, and serves as its lower bound.
Minimum-Variance Portfolio Composition
Empirical studies document that equity portfolios constructed to have the lowest possible risk have surprisingly high average returns. Clarke, de Silva, andThorley derive an analytic solution for the
Continuous control with deep reinforcement learning
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Proximal Policy Optimization Algorithms
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective