An Actor-Critic Method for Simulation-Based Optimization

  title={An Actor-Critic Method for Simulation-Based Optimization},
  author={Kuo Li and Qing-Shan Jia and Jiaqi Yan},

Figures from this paper



Soft Actor-Critic Algorithms and Applications

Soft Actor-Critic (SAC), the recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework, achieves state-of-the-art performance, outperforming prior on-policy and off- policy methods in sample-efficiency and asymptotic performance.

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

This paper proposes soft actor-critic, an off-policy actor-Critic deep RL algorithm based on the maximum entropy reinforcement learning framework, and achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off- policy methods.

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective

Continuous control with deep reinforcement learning

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Addressing Function Approximation Error in Actor-Critic Methods

This paper builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation, and draws the connection between target networks and overestimation bias.

Reinforcement Learning with Deep Energy-Based Policies

A method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before, is proposed and a new algorithm, called soft Q-learning, that expresses the optimal policy via a Boltzmann distribution is applied.

Trust Region Policy Optimization

A method for optimizing control policies, with guaranteed monotonic improvement, by making several approximations to the theoretically-justified scheme, called Trust Region Policy Optimization (TRPO).

Learning-Based Safety-Critical Motion Planning with Input-to-State Barrier Certificate

The proposed method with uncertainty analysis helps to deal with disturbance towards motion planning task and produces improvement in the learning process and the adaptation to safety and performance.

Dynamic Programming

Ranking and Selection as Stochastic Control

This work formulate the fully sequential sampling and selection decision in statistical ranking and selection as a stochastic control problem as a Bayesian framework, and derives an approximately optimal allocation policy that possesses both one-step-ahead and asymptotic optimality for independent normal sampling distributions.