• Corpus ID: 245424738

Direct Behavior Specification via Constrained Reinforcement Learning

  title={Direct Behavior Specification via Constrained Reinforcement Learning},
  author={Julien Roy and Roger Girgis and Joshua Romoff and Pierre-Luc Bacon and Christopher Joseph Pal},
The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors. Most often, practitioners go about the task of behavior specification by manually engineering the reward function, a counter-intuitive process that requires several iterations and is prone to reward hacking by the agent. In this work, we argue that constrained RL, which has almost exclusively been used for safe RL, also has the potential to significantly reduce the… 

Figures and Tables from this paper

Automated Play-Testing through RL Based Human-Like Play-Styles Generation

This work presents CARMI : a Configurable Agent with Relative Metrics as Input, an agent able to emulate the players play-styles, even on previously unseen levels, thus compatible with the constraints of modern video game production.

Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation

This work addresses the problem of safe reinforcement learning from pixel observations in a constrained, partially observable Markov decision process framework, and employs a novel safety critic using the stochastic latent actor-critic (SLAC) approach.

Towards Informed Design and Validation Assistance in Computer Games Using Imitation Learning

The survey results show that the proposed method is indeed a valid approach to game validation and that data-driven programming would be a useful aid to reducing effort and increasing quality of modern playtesting.

Constrained Reinforcement Learning for Robotics via Scenario-Based Programming

This paper presents a novel technique for incorporating domain-expert knowledge into a constrained DRL training loop that exploits the scenario-based programming paradigm, which is designed to allow specifying such knowledge in a simple and intuitive way.

Verifying Learning-Based Robotic Navigation Systems

This work is the first to demonstrate the use of DNN verification backends for recognizing suboptimal DRL policies in real-world robots, and for filtering out unwanted policies.

Beyond Ads: Sequential Decision-Making Algorithms in Law and Public Policy

The main thesis is that law and public policy pose distinct methodological challenges that the machine learning community has not yet addressed, and machine learning will need to address these methodological problems to move ''beyond ads.



Benchmarking Safe Exploration in Deep Reinforcement Learning

This work proposes to standardize constrained RL as the main formalism for safe exploration, and presents the Safety Gym benchmark suite, a new slate of high-dimensional continuous control environments for measuring research progress on constrained RL.

Soft Actor-Critic Algorithms and Applications

Soft Actor-Critic (SAC), the recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework, achieves state-of-the-art performance, outperforming prior on-policy and off- policy methods in sample-efficiency and asymptotic performance.

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective

Reinforcement Learning: An Introduction

This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

Deep Reinforcement Learning for Navigation in AAA Video Games

This work proposes to use Deep Reinforcement Learning (Deep RL) to learn how to navigate 3D maps using any navigation ability, and finds that this approach performs surprisingly well, achieving at least $90\%$ success rate on all tested scenarios.

Imitation Learning: Progress, Taxonomies and Opportunities

This survey provides a systematic review on imitation learning, introducing the background knowledge from development history and preliminaries, followed by presenting different taxonomies within Imitation Learning and key milestones of the field.

First Order Constrained Optimization in Policy Space

This work proposes a novel approach called First Order Constrained Optimization in Policy Space (FOCOPS) which maximizes an agent's overall reward while ensuring the agent satisfies a set of cost constraints.

Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

This work proposes a novel Lagrange multiplier update method that utilizes derivatives of the constraint function, and introduces a new method to ease controller tuning by providing invariance to the relative numerical scales of reward and cost.

Volume 7

Accounting is the language of business. As such it has terminology which is similar to a foreign language. It is the purpose of this paper to discuss the intercept of accounting and the Christian

Constrained Markov Decision Processes

INTRODUCTION Examples of Constrained Dynamic Control Problems On Solution Approaches for CMDPs with Expected Costs Other Types of CMDPs Cost Criteria and Assumptions The Convex Analytical Approach