• Corpus ID: 232240103

Learning to Shape Rewards using a Game of Switching Controls

  title={Learning to Shape Rewards using a Game of Switching Controls},
  author={David Henry Mguni and Jianhong Wang and Taher Jafferjee and Nicolas Perez Nieves and Wenbin Song and Yaodong Yang and Feifei Tong and Hui Chen and Jiangcheng Zhu and Yali Du and Jun Wang},
Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construction is timeconsuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimal Shaping Algorithm (ROSA), an automated RS framework in which the shapingreward function is… 

Figures from this paper

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

A novel architecture named Multi-Agent Transformer is introduced that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the task is to map agents’ observation sequence to agent’s optimal action sequence and endows MAT with monotonic performance improvement guarantee.



Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

This paper formally derive the gradient of the expected true reward with respect to the shaping weight function parameters and accordingly proposes three learning algorithms based on different assumptions.

On Learning Intrinsic Rewards for Policy Gradient Methods

This paper derives a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents and compares the performance of an augmented agent that uses this algorithm to provide additive intrinsic rewards to an A2C-based policy learner and a PPO-basedpolicy learner with a baselineAgent that uses the same policy learners but with only extrinsic rewards.

Reward Shaping via Meta-Learning

A general meta-learning framework is proposed to automatically learn the efficient reward shaping on newly sampled tasks, assuming only shared state space but not necessarily action space, and derives the theoretically optimal reward shaping in terms of credit assignment in model-free RL.

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

This work introduces a simple and effective model-free method to learn from shaped distance-to-goal rewards on tasks where success depends on reaching a goal state and introduces an auxiliary distance-based reward based on pairs of rollouts to encourage diverse exploration.

An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems

It is demonstrated empirically the performance of reward shaping in two problem domains within the context of RoboCup KeepAway by designing three reward shaping schemes, encouraging specific behaviour such as keeping a minimum distance from other players on the same team and taking on specific roles.

Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping

Conditions under which modi cations to the reward function of a Markov decision process preserve the op timal policy are investigated to shed light on the practice of reward shap ing a method used in reinforcement learn ing whereby additional training rewards are used to guide the learning agent.

Dynamic potential-based reward shaping

This paper proves and demonstrates a method of extending potential-based reward shaping to allow dynamic shaping and maintain the guarantees of policy invariance in the single-agent case and consistent Nash equilibria in the multi- agent case.

PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals

This work proposes PlanGAN, a model-based algorithm specifically designed for solving multi-goal tasks in environments with sparse rewards, and indicates that it can achieve comparable performance whilst being around 4-8 times more sample efficient.

Theoretical considerations of potential-based reward shaping for multi-agent systems

It is proven that the equivalence to Q-table initialisation remains and the Nash Equilibria of the underlying stochastic game are not modified, and it is demonstrated empirically that potential-based reward shaping affects exploration and, consequentially, can alter the joint policy converged upon.