A practical guide to multi-objective reinforcement learning and planning
@article{Hayes2021APG, title={A practical guide to multi-objective reinforcement learning and planning}, author={Conor F. Hayes and Roxana Ruadulescu and Eugenio Bargiacchi and Johan Kallstrom and Matthew Macfarlane and Mathieu Reymond and Timothy Verstraeten and Luisa M. Zintgraf and Richard Dazeley and Fredrik Heintz and Enda Howley and Athirai Aravazhi Irissappane and Patrick Mannion and Ann Now'e and Gabriel de Oliveira Ramos and Marcello Restelli and Peter Vamplew and Diederik M. Roijers}, journal={Autonomous Agents and Multi-Agent Systems}, year={2021}, volume={36}, pages={1-59} }
Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the…
54 Citations
gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning Approach
- Computer ScienceArXiv
- 2022
Generalized Thresholded Lexicographic Ordering (gTLO), a novel method that aims to combine non-linear MORL with the advantages of generalized MORL, is proposed and a deep reinforcement learning realization of the algorithm is introduced.
Addressing the issue of stochastic environments and local decision-making in multi-objective reinforcement learning
- Computer ScienceArXiv
- 2022
This study presents several alternative methods that may be more suitable to overcome noisy Q value estimate issue and also find SER optimal policy in MOMDPs with stochastic transitions.
Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization
- Computer ScienceArXiv
- 2023
A novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes that improve sample-efficient learning and empirically shows that the method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks, both with discrete and continuous state spaces.
PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm
- Computer ScienceArXiv
- 2022
A novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches is proposed.
Expected Scalarised Returns Dominance: A New Solution Concept for Multi-Objective Decision Making
- Computer ScienceNeural Computing and Applications
- 2022
A new dominance criterion, known as expected scalarised returns (ESR) dominance, is defined that extends first-order stochastic dominance to allow a set of optimal policies to be learned in practice and defines a new solution concept called the ESR set, which is aset of policies that are ESR dominant.
Metaheuristics-based Exploration Strategies for Multi-Objective Reinforcement Learning
- Computer ScienceProceedings of the 14th International Conference on Agents and Artificial Intelligence
- 2022
This work introduces a modular framework for the learning phase of such algorithms, allowing to ease the study of the EED in InnerLoop MPMORL algorithms, and presents three new exploration strategies inspired from the metaheuristics domain.
Dominance Criteria and Solution Sets for the Expected Scalarised Returns
- Economics, Computer Science
- 2021
This paper proposes first-order stochastic dominance as a criterion to build solution sets to maximise expected utility and proposes a new dominance criterion, known as expected scalarised returns (ESR) dominance, that extends first- order stochastics dominance to allow a set of optimal policies to be learned in practice.
A Multiobjective Reinforcement Learning Approach to Trip Building
- Computer ScienceATT@IJCAI
- 2022
This work formulate the problem of multiple agents learn to travel from A to B in a traffic network as a reinforcement learning task in which it takes into account: non-stationarity, more than one objective, and a a stochastic game based model.
Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer
- Computer ScienceICML
- 2022
An SF-based extension of the Optimistic Linear Support algorithm is introduced to learn a set of policies whose SFs form a convex coverage set and it is proved that policies in this set can be combined via generalized policy improvement to construct optimal behaviors for any new linearly-expressible tasks, without requiring any additional training samples.
Opponent learning awareness and modelling in multi-objective normal form games
- EconomicsNeural Computing and Applications
- 2021
This work considers two-player multi-objective normal form games with non-linear utility functions under the scalarised expected returns optimisation criterion and contributes novel actor-critic and policy gradient formulations to allow reinforcement learning of mixed strategies in this setting.
References
SHOWING 1-10 OF 206 REFERENCES
Multi-Objective Decision Making
- Computer ScienceSynthesis Lectures on Artificial Intelligence and Machine Learning
- 2017
This book outlines how to deal with multiple objectives in decision-theoretic planning and reinforcement learning algorithms, and discusses three promising application domains for multi-objective decision making algorithms: energy, health, and infrastructure and transportation.
Additional planning with multiple objectives for reinforcement learning
- Computer ScienceKnowl. Based Syst.
- 2020
Meta-Learning for Multi-objective Reinforcement Learning
- Computer Science2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- 2019
This paper introduces a novel MORL approach by training a meta-policy, a policy simultaneously trained with multiple tasks sampled from a task distribution, for a number of randomly sampled Markov decision processes (MDPs) and demonstrates that this formulation results in a better approximation of the Pareto optimal solutions.
Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning
- Computer ScienceThe Knowledge Engineering Review
- 2018
The results constitute the first empirical evidence that agents using potential-based reward shaping and difference rewards methodologies can sample true Pareto optimal solutions in cooperative multi-objective stochastic games.
Tree-based Fitted Q-iteration for Multi-Objective Markov Decision problems
- Computer ScienceThe 2012 International Joint Conference on Neural Networks (IJCNN)
- 2012
The algorithm proposed in this paper is an extension of Fitted Q-iteration (FQI) that enables to learn the control policies for all the linear combinations of preferences assigned to the objectives in a single training process.
A Survey of Multi-Objective Sequential Decision-Making
- Computer ScienceJ. Artif. Intell. Res.
- 2013
This article surveys algorithms designed for sequential decision-making problems with multiple objectives and proposes a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function, and the type of policies considered.
Multi-objective Reinforcement Learning for the Expected Utility of the Return
- Economics, Computer Science
- 2020
A novel method is proposed, based on policy gradient, to learn good policies with respect to the expected value of the utility of the returns, and it is shown empirically that this method is key to learning good policies.
A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation
- Computer ScienceNeurIPS
- 2019
A generalized version of the Bellman equation is proposed to learn a single parametric representation for optimal policies over the space of all possible preferences in MORL, with the goal of enabling few-shot adaptation to new tasks.
A novel adaptive weight selection algorithm for multi-objective multi-agent reinforcement learning
- Computer Science2014 International Joint Conference on Neural Networks (IJCNN)
- 2014
A thorough analysis of well known scalarization schemes within the multi-objective multi-agent reinforcement learning setup and a novel adaptive weight algorithm which interacts with the underlying local multi- objective solvers and allows for a better coverage of the Pareto front are proposed.
Relationship Explainable Multi-objective Reinforcement Learning with Semantic Explainability Generation
- Computer ScienceArXiv
- 2019
A vector value function based multi-Objective reinforcement learning (V2f-MORL) approach that seeks to quantify the inter-objective relationship via reinforcement learning when the impact of one objective on others is unknown a prior and is demonstrated via a MuJoCo based robotics case study.