A practical guide to multi-objective reinforcement learning and planning

  title={A practical guide to multi-objective reinforcement learning and planning},
  author={Conor F. Hayes and Roxana Ruadulescu and Eugenio Bargiacchi and Johan Kallstrom and Matthew Macfarlane and Mathieu Reymond and Timothy Verstraeten and Luisa M. Zintgraf and Richard Dazeley and Fredrik Heintz and Enda Howley and Athirai Aravazhi Irissappane and Patrick Mannion and Ann Now'e and Gabriel de Oliveira Ramos and Marcello Restelli and Peter Vamplew and Diederik M. Roijers},
  journal={Autonomous Agents and Multi-Agent Systems},
Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the… 

gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning Approach

Generalized Thresholded Lexicographic Ordering (gTLO), a novel method that aims to combine non-linear MORL with the advantages of generalized MORL, is proposed and a deep reinforcement learning realization of the algorithm is introduced.

Addressing the issue of stochastic environments and local decision-making in multi-objective reinforcement learning

This study presents several alternative methods that may be more suitable to overcome noisy Q value estimate issue and also find SER optimal policy in MOMDPs with stochastic transitions.

Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization

A novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes that improve sample-efficient learning and empirically shows that the method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks, both with discrete and continuous state spaces.

PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm

A novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches is proposed.

Expected Scalarised Returns Dominance: A New Solution Concept for Multi-Objective Decision Making

A new dominance criterion, known as expected scalarised returns (ESR) dominance, is defined that extends first-order stochastic dominance to allow a set of optimal policies to be learned in practice and defines a new solution concept called the ESR set, which is aset of policies that are ESR dominant.

Metaheuristics-based Exploration Strategies for Multi-Objective Reinforcement Learning

This work introduces a modular framework for the learning phase of such algorithms, allowing to ease the study of the EED in InnerLoop MPMORL algorithms, and presents three new exploration strategies inspired from the metaheuristics domain.

Dominance Criteria and Solution Sets for the Expected Scalarised Returns

This paper proposes first-order stochastic dominance as a criterion to build solution sets to maximise expected utility and proposes a new dominance criterion, known as expected scalarised returns (ESR) dominance, that extends first- order stochastics dominance to allow a set of optimal policies to be learned in practice.

A Multiobjective Reinforcement Learning Approach to Trip Building

This work formulate the problem of multiple agents learn to travel from A to B in a traffic network as a reinforcement learning task in which it takes into account: non-stationarity, more than one objective, and a a stochastic game based model.

Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer

An SF-based extension of the Optimistic Linear Support algorithm is introduced to learn a set of policies whose SFs form a convex coverage set and it is proved that policies in this set can be combined via generalized policy improvement to construct optimal behaviors for any new linearly-expressible tasks, without requiring any additional training samples.

Opponent learning awareness and modelling in multi-objective normal form games

This work considers two-player multi-objective normal form games with non-linear utility functions under the scalarised expected returns optimisation criterion and contributes novel actor-critic and policy gradient formulations to allow reinforcement learning of mixed strategies in this setting.



Multi-Objective Decision Making

This book outlines how to deal with multiple objectives in decision-theoretic planning and reinforcement learning algorithms, and discusses three promising application domains for multi-objective decision making algorithms: energy, health, and infrastructure and transportation.

Meta-Learning for Multi-objective Reinforcement Learning

This paper introduces a novel MORL approach by training a meta-policy, a policy simultaneously trained with multiple tasks sampled from a task distribution, for a number of randomly sampled Markov decision processes (MDPs) and demonstrates that this formulation results in a better approximation of the Pareto optimal solutions.

Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning

The results constitute the first empirical evidence that agents using potential-based reward shaping and difference rewards methodologies can sample true Pareto optimal solutions in cooperative multi-objective stochastic games.

Tree-based Fitted Q-iteration for Multi-Objective Markov Decision problems

The algorithm proposed in this paper is an extension of Fitted Q-iteration (FQI) that enables to learn the control policies for all the linear combinations of preferences assigned to the objectives in a single training process.

A Survey of Multi-Objective Sequential Decision-Making

This article surveys algorithms designed for sequential decision-making problems with multiple objectives and proposes a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function, and the type of policies considered.

Multi-objective Reinforcement Learning for the Expected Utility of the Return

A novel method is proposed, based on policy gradient, to learn good policies with respect to the expected value of the utility of the returns, and it is shown empirically that this method is key to learning good policies.

A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation

A generalized version of the Bellman equation is proposed to learn a single parametric representation for optimal policies over the space of all possible preferences in MORL, with the goal of enabling few-shot adaptation to new tasks.

A novel adaptive weight selection algorithm for multi-objective multi-agent reinforcement learning

A thorough analysis of well known scalarization schemes within the multi-objective multi-agent reinforcement learning setup and a novel adaptive weight algorithm which interacts with the underlying local multi- objective solvers and allows for a better coverage of the Pareto front are proposed.

Relationship Explainable Multi-objective Reinforcement Learning with Semantic Explainability Generation

A vector value function based multi-Objective reinforcement learning (V2f-MORL) approach that seeks to quantify the inter-objective relationship via reinforcement learning when the impact of one objective on others is unknown a prior and is demonstrated via a MuJoCo based robotics case study.