Reinforcement Learning for Multi-Objective Optimization of Online Decisions in High-Dimensional Systems
@article{Meisheri2019ReinforcementLF, title={Reinforcement Learning for Multi-Objective Optimization of Online Decisions in High-Dimensional Systems}, author={Hardik Meisheri and Vinita Baniwal and Nazneen N. Sultana and Balaraman Ravindran and Harshad Khadilkar}, journal={ArXiv}, year={2019}, volume={abs/1910.00211} }
This paper describes a purely data-driven solution to a class of sequential decision-making problems with a large number of concurrent online decisions, with applications to computing systems and operations research. We assume that while the micro-level behaviour of the system can be broadly captured by analytical expressions or simulation, the macro-level or emergent behaviour is complicated by non-linearity, constraints, and stochasticity. If we represent the set of concurrent decisions to be…
Figures and Tables from this paper
References
SHOWING 1-10 OF 64 REFERENCES
Actor-Critic Algorithms
- Computer ScienceNIPS
- 1999
This thesis proposes and studies actor-critic algorithms which combine the above two approaches with simulation to find the best policy among a parameterized class of policies, and proves convergence of the algorithms for problems with general state and decision spaces.
A Comprehensive Survey of Multiagent Reinforcement Learning
- Computer ScienceIEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)
- 2008
The benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied, and an outlook for the field is provided.
Reinforcement learning of motor skills in high dimensions: A path integral approach
- Computer Science2010 IEEE International Conference on Robotics and Automation
- 2010
This paper derives a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals, and believes that this new algorithm, Policy Improvement with Path Integrals (PI2), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics.
Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system
- BusinessExpert Syst. Appl.
- 2009
Overcoming Exploration in Reinforcement Learning with Demonstrations
- Computer Science2018 IEEE International Conference on Robotics and Automation (ICRA)
- 2018
This work uses demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm.
Action Branching Architectures for Deep Reinforcement Learning
- Computer ScienceAAAI
- 2018
The empirical results show that the proposed agent scales gracefully to environments with increasing action dimensionality and indicate the significance of the shared decision module in coordination of the distributed action branches.
Handbook of Learning and Approximate Dynamic Programming
- Computer ScienceIEEE Transactions on Automatic Control
- 2006
This chapter discusses reinforcement learning in large, high-dimensional state spaces, model-based adaptive critic designs, and applications of approximate dynamic programming in power systems control.
Dynamic Programming and Optimal Control
- Computer Science
- 1995
The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential…
Continuous control with deep reinforcement learning
- Computer ScienceICLR
- 2016
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Efficient Reductions for Imitation Learning
- Computer ScienceAISTATS
- 2010
This work proposes two alternative algorithms for imitation learning where training occurs over several episodes of interaction and shows that this leads to stronger performance guarantees and improved performance on two challenging problems: training a learner to play a 3D racing game and Mario Bros.