• Corpus ID: 203610302

Reinforcement Learning for Multi-Objective Optimization of Online Decisions in High-Dimensional Systems

  title={Reinforcement Learning for Multi-Objective Optimization of Online Decisions in High-Dimensional Systems},
  author={Hardik Meisheri and Vinita Baniwal and Nazneen N. Sultana and Balaraman Ravindran and Harshad Khadilkar},
This paper describes a purely data-driven solution to a class of sequential decision-making problems with a large number of concurrent online decisions, with applications to computing systems and operations research. We assume that while the micro-level behaviour of the system can be broadly captured by analytical expressions or simulation, the macro-level or emergent behaviour is complicated by non-linearity, constraints, and stochasticity. If we represent the set of concurrent decisions to be… 


Actor-Critic Algorithms
This thesis proposes and studies actor-critic algorithms which combine the above two approaches with simulation to find the best policy among a parameterized class of policies, and proves convergence of the algorithms for problems with general state and decision spaces.
A Comprehensive Survey of Multiagent Reinforcement Learning
The benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied, and an outlook for the field is provided.
Reinforcement learning of motor skills in high dimensions: A path integral approach
This paper derives a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals, and believes that this new algorithm, Policy Improvement with Path Integrals (PI2), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics.
Overcoming Exploration in Reinforcement Learning with Demonstrations
This work uses demonstrations to overcome the exploration problem and successfully learn to perform long-horizon, multi-step robotics tasks with continuous control such as stacking blocks with a robot arm.
Action Branching Architectures for Deep Reinforcement Learning
The empirical results show that the proposed agent scales gracefully to environments with increasing action dimensionality and indicate the significance of the shared decision module in coordination of the distributed action branches.
Handbook of Learning and Approximate Dynamic Programming
This chapter discusses reinforcement learning in large, high-dimensional state spaces, model-based adaptive critic designs, and applications of approximate dynamic programming in power systems control.
Dynamic Programming and Optimal Control
The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential
Continuous control with deep reinforcement learning
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Efficient Reductions for Imitation Learning
This work proposes two alternative algorithms for imitation learning where training occurs over several episodes of interaction and shows that this leads to stronger performance guarantees and improved performance on two challenging problems: training a learner to play a 3D racing game and Mario Bros.