• Corpus ID: 236087991

Multimodal Reward Shaping for Efficient Exploration in Reinforcement Learning

  title={Multimodal Reward Shaping for Efficient Exploration in Reinforcement Learning},
  author={Mingqi Yuan and Mon-on Pun and Yi Chen and Dong Wang and Haojun Li},
Maintaining long-term exploration ability remains one of the challenges of deep reinforcement learning (DRL). In practice, the reward shaping-based approaches are leveraged to provide intrinsic rewards for the agent to incentivize motivation. However, most existing IRS modules rely on attendant models or additional memory to record and analyze learning procedures, which leads to high computational complexity and low robustness. Moreover, they overemphasize the influence of a single state on… 

Figures and Tables from this paper

Distributional Reward Estimation for Effective Multi-Agent Deep Reinforcement Learning

A novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL) is proposed, to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training.

Deep reinforcement learning and its applications in medical imaging and radiation therapy: a survey

The basics of reinforcement learning are introduced and various categories of DRL algorithms and DRL models developed for medical image analysis and radiation treatment planning optimization are reviewed, which can resolve the challenges from scarce and heterogeneous annotated medical image data.

Semantic-Aware Collaborative Deep Reinforcement Learning Over Wireless Cellular Networks

A new heterogeneous federated DRL (HFDRL) algorithm is proposed to select the best subset of semantically relevant DRL agents for collaboration to enable a group of heterogeneous untrained agents with semantically-linked DRL tasks to collaborate efficiently across a resource-constrained wireless cellular network.

Utility of doctrine with multi-agent RL for military engagements

A hybrid training approach that combines MARL with doctrine (MARDOC) is introduced to test whether doctrine-informed MARL policies produce more realistic behaviors and/or improved performance in a simple military engagement task and suggests that MARDOC approaches provide a sufficient advantage over MARL alone due to agent doctrinal guidance of MARL exploration to overcome the complexities in military domains.

Semantic Communications for 6G Future Internet: Fundamentals, Applications, and Challenges

This paper investigates the fundamentals of SemCom, its applications in 6G networks, and the existing challenges and open issues for further direction, and discusses the applications, the challenges and technologies related to semantics and communication.

A Scenario- and Reinforcement Learning-Based BESS Control Strategy for the Frequency Restoration Reserve Market

Dealing with uncertainty is a key component in the control of Battery Energy Storage Systems (BESSs) on the electricity markets. This study proposes the incorporation of scenario sets as a



Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods.

RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

This work proposes a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation and rewards the agent substantially more for interacting with objects that it can control.

Go-Explore: a New Approach for Hard-Exploration Problems

A new algorithm called Go-Explore, which exploits the following principles to remember previously visited states, solve simulated environments through any available means, and robustify via imitation learning, which results in a dramatic performance improvement on hard-exploration problems.

State Entropy Maximization with Random Encoders for Efficient Exploration

The experiments show that RE3 significantly improves the sample-efficiency of both model-free and model-based RL methods on locomotion and navigation tasks from DeepMind Control Suite and MiniGrid benchmarks, and allows learning diverse behaviors without extrinsic rewards.

Intrinsic Reward Driven Imitation Learning via Generative Model

This work proposes a novel reward learning module to generate intrinsic reward signals via a generative model that can perform better forward state transition and backward action encoding, which improves the module's dynamics modeling ability in the environment.

On the sample complexity of reinforcement learning.

Novel algorithms with more restricted guarantees are suggested whose sample complexities are again independent of the size of the state space and depend linearly on the complexity of the policy class, but have only a polynomial dependence on the horizon time.

Never Give Up: Learning Directed Exploration Strategies

This work constructs an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies, thereby encouraging the agent to repeatedly revisit all states in its environment.

Episodic Curiosity through Reachability

A new curiosity method which uses episodic memory to form the novelty bonus, based on how many environment steps it takes to reach the current observation from those in memory - which incorporates rich information about environment dynamics.

Large-Scale Study of Curiosity-Driven Learning

This paper performs the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite, and shows surprisingly good performance.

Efficient Exploration via State Marginal Matching

This work recast exploration as a problem of State Marginal Matching (SMM), where it is demonstrated that agents that directly optimize the SMM objective explore faster and adapt more quickly to new tasks as compared to prior exploration methods.