• Corpus ID: 231749664

An Abstraction-based Method to Verify Multi-Agent Deep Reinforcement-Learning Behaviours

  title={An Abstraction-based Method to Verify Multi-Agent Deep Reinforcement-Learning Behaviours},
  author={Pierre El Mqirmi and Francesco Belardinelli and Borja G. Leon},
Multi-agent reinforcement learning (RL) often struggles to ensure the safe behaviours of the learning agents, and therefore it is generally not adapted to safety-critical applications. To address this issue, we present a methodology that combines formal verification with (deep) RL algorithms to guarantee the satisfaction of formally-specified safety constraints both in training and testing. The approach we propose expresses the constraints to verify in Probabilistic Computation Tree Logic (PCTL… 

Figures and Tables from this paper


Assurance in Reinforcement Learning Using Quantitative Verification
An assured reinforcement learning (ARL) method which uses quantitative verification (QV) to restrict the agent behaviour to areas that satisfy safety, reliability and performance constraints specified in probabilistic temporal logic.
Verification and repair of control policies for safe reinforcement learning
This work proposes a general-purpose automated methodology to verify risk bounds and repair policies of agents whose policies are learned by reinforcement, and shows that this approach is based on probabilistic model checking algorithms and tools more effective than comparable ones.
Safe Reinforcement Learning via Shielding
This work proposes a new approach to learn optimal policies while enforcing properties expressed in temporal logic by synthesizing a reactive system called a shield that monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification.
End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks
This work proposes a controller architecture that combines a model-free RL-based controller with model-based controllers utilizing control barrier functions (CBFs) and on-line learning of the unknown system dynamics, in order to ensure safety during learning.
Shielded Decision-Making in MDPs
This work presents the concept of a shield that forces decision-making to provably adhere to safety requirements with high probability, and presents a method to compute probabilities of decision making regarding temporal logic constraints.
Multi-agent Reinforcement Learning: An Overview
This chapter reviews a representative selection of multi-agent reinforcement learning algorithms for fully cooperative, fully competitive, and more general (neither cooperative nor competitive) tasks.
A comprehensive survey on safe reinforcement learning
This work categorize and analyze two approaches of Safe Reinforcement Learning, based on the modification of the optimality criterion, the classic discounted finite/infinite horizon, with a safety factor and the incorporation of external knowledge or the guidance of a risk metric.
Markov Games as a Framework for Multi-Agent Reinforcement Learning
Automatic shaping and decomposition of reward functions
This paper investigates the problem of automatically learning how to restructure the reward function of a Markov decision process so as to speed up reinforcement learning. We begin by describing a