Asynchronous Episodic Deep Deterministic Policy Gradient: Toward Continuous Control in Computationally Complex Environments

@article{Zhang2021AsynchronousED,
  title={Asynchronous Episodic Deep Deterministic Policy Gradient: Toward Continuous Control in Computationally Complex Environments},
  author={Zhizheng Zhang and Jiale Chen and Zhibo Chen and Weiping Li},
  journal={IEEE Transactions on Cybernetics},
  year={2021},
  volume={51},
  pages={604-613}
}
Deep deterministic policy gradient (DDPG) has been proved to be a successful reinforcement learning (RL) algorithm for continuous control tasks. However, DDPG still suffers from data insufficiency and training inefficiency, especially, in computationally complex environments. In this article, we propose asynchronous episodic DDPG (AE-DDPG), as an expansion of DDPG, which can achieve more effective learning with less training time required. First, we design a modified scheme for data collection… Expand
Duplicated Replay Buffer for Asynchronous Deep Deterministic Policy Gradient
TLDR
This research intends to make the transaction selection process more efficient by increasing the likelihood of selecting important transactions from the replay memory buffer by using a secondary replay memorybuffer that stores more critical transactions. Expand
Sample Efficient Reinforcement Learning Method via High Efficient Episodic Memory
TLDR
This paper proposes a new sample-efficient reinforcement learning architecture which introduces a new episodic memory module and incorporates episodic thought into some key components of DRL: exploration, experience replay and loss function. Expand
Computational Performance of Deep Reinforcement Learning to find Nash Equilibria
TLDR
The performance of deep deterministic policy gradient (DDPG) is tested to learn Nash equilibria in a setting where firms compete in prices and finds parameter choices that can reach convergence rates of up to 99%. Expand
Sample-efficient deep reinforcement learning with directed associative graph
Reinforcement learning can be modeled as markov decision process mathematically. In consequence, the interaction samples as well as the connection relation between them are two main types ofExpand
Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning
TLDR
This work constructs a simplified Markov Decision Process for which exact Q-values can be computed efficiently as more data comes in and shows that the Q-value for each transition in the simplified MDP is a lower bound of the Q -value for the same Transition in the original continuous Q-learning problem. Expand
SEM: Adaptive Staged Experience Access Mechanism for Reinforcement Learning
TLDR
The Staged Experience Mechanism (SEM) is introduced - a novel management mechanism of experience memory that adaptively regulates the proportion of experiences based on the current learning stages, enabling agents not only to learn from new experiences but also from very old ones given limited memory capacity. Expand
Solving Continuous Control with Episodic Memory
TLDR
This study combines episodic memory with Actor-Critic architecture by modifying critic’s objective and improves performance by introducing episodic-based replay buffer prioritization. Expand
Competitive and Cooperative Heterogeneous Deep Reinforcement Learning
TLDR
This work presents a competitive and cooperative heterogeneous deep reinforcement learning framework called C2HRL, which aims to learn a superior agent that exceeds the capabilities of the individual agent in an agent pool through two agent management mechanisms. Expand
Cooperative Heterogeneous Deep Reinforcement Learning
TLDR
A cooperative learning framework that classifies heterogeneous agents into two classes: global agents and local agents, and employs global agents to guide the learning of local agents so that local agents can benefit from sampleefficient agents and simultaneously maintain their advantages, e.g., stability. Expand
Portfolio Optimization with 2D Relative-Attentional Gated Transformer
  • Tae Wan Kim, Matloob Khushi
  • Computer Science, Economics
  • 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)
  • 2020
TLDR
A novel Deterministic Policy Gradient with 2D Relative-attentional Gated Transformer model that better understands the peculiar structure of the financial data in the portfolio optimization domain and outperformed baseline models and demonstrated its effectiveness. Expand
...
1
2
...

References

SHOWING 1-10 OF 37 REFERENCES
Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning
TLDR
The main incentive of this work is to keep the advantages of model-free Q-learning while minimizing real-world interaction by the employment of a dynamics model learned in parallel, to counteract adverse effects of imaginary rollouts with an inaccurate model. Expand
Continuous control with deep reinforcement learning
TLDR
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs. Expand
Accelerated Methods for Deep Reinforcement Learning
TLDR
This work investigates how to optimize existing deep RL algorithms for modern computers, specifically for a combination of CPUs and GPUs, and confirms that both policy gradient and Q-value learning algorithms can be adapted to learn using many parallel simulator instances. Expand
Episodic Memory Deep Q-Networks
TLDR
This paper presents a simple yet effective biologically inspired RL algorithm called Episodic Memory Deep Q-Networks (EMDQN), which leverages episodic memory to supervise an agent during training, and shows that the proposed method can lead to better sample efficiency and is more likely to find good policies. Expand
Distributed Distributional Deterministic Policy Gradients
TLDR
The results show that across a wide variety of simple control tasks, difficult manipulation tasks, and a set of hard obstacle-based locomotion tasks the D4PG algorithm achieves state of the art performance. Expand
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
TLDR
A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. Expand
Parameter Space Noise for Exploration
TLDR
This work demonstrates that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Expand
Proximal Policy Optimization Algorithms
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objectiveExpand
Asynchronous Methods for Deep Reinforcement Learning
TLDR
A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input. Expand
Prioritized Experience Replay
TLDR
A framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently, in Deep Q-Networks, a reinforcement learning algorithm that achieved human-level performance across many Atari games. Expand
...
1
2
3
4
...