# A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation

@inproceedings{Yang2019AGA, title={A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation}, author={Runzhe Yang and Xingyuan Sun and Karthik Narasimhan}, booktitle={NeurIPS}, year={2019} }

We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks. In MORL, the aim is to learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent. While this alleviates dependence on scalar reward design, the expected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce…

## Figures and Tables from this paper

## 78 Citations

### Provable Multi-Objective Reinforcement Learning with Generative Models

- Computer ScienceArXiv
- 2020

This work proposes a new algorithm called model-based envelop value iteration (EVI), which generalizes the enveloped multi-objective $Q$-learning algorithm in Yang, 2019 and can learn a near-optimal value function with polynomial sample complexity and linear convergence speed.

### PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm

- Computer Science
- 2022

The proposed approach, Preference-Driven MORL (PD-MORL), utilizes the preferences as guidance to update the network parameters, and is evaluated on challenging multi-objective continuous control tasks.

### gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning Approach

- Computer ScienceArXiv
- 2022

Generalized Thresholded Lexicographic Ordering (gTLO), a novel method that aims to combine non-linear MORL with the advantages of generalized MORL, is proposed and a deep reinforcement learning realization of the algorithm is introduced.

### A Distributional View on Multi-Objective Policy Optimization

- Computer ScienceICML
- 2020

This paper proposes a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way, and uses supervised learning to fit a parametric policy to a combination of these distributions.

### Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer

- Computer ScienceICML
- 2022

An SF-based extension of the Optimistic Linear Support algorithm is introduced to learn a set of policies whose SFs form a convex coverage set and it is proved that policies in this set can be combined via generalized policy improvement to construct optimal behaviors for any new linearly-expressible tasks, without requiring any additional training samples.

### On Multi-objective Policy Optimization as a Tool for Reinforcement Learning

- Computer ScienceArXiv
- 2021

The principles underlying MORL are studied and a new algorithm, Distillation of a Mixture of Experts (DiME), is introduced that is intuitive and scale-invariant under some conditions and outperforms state-of-the-art on two established offline RL benchmarks.

### Influence-based Reinforcement Learning for Intrinsically-motivated Agents

- Computer ScienceArXiv
- 2021

This work presents an algorithmic framework of two RL agents each with a different objective, which introduces a novel function approximation approach to assess the influence F of a certain policy on others, and achieves the exploration criterion.

### Multi-Objective Reinforcement Learning with Non-Linear Scalarization

- Computer ScienceAAMAS
- 2022

This paper considers the problem of MORL where multiple objectives are combined using a non-linear scalarization, and proposes a solution using steady-state occupancy measures and long-term average rewards to maximize the scalarized objective.

### Pareto Policy Adaptation

- Computer ScienceICLR
- 2022

This work introduces Pareto Policy Adaptation ( PPA), a loss function that adapts the policy to be optimal with respect to any distribution over preferences, and uses implicit differentiation to back-propagate the loss gradient bypassing the operations of the projected gradient descent solver.

### Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

- Computer ScienceArXiv
- 2022

An Anchor-changing Regularized Natural Policy Gradient (ARNPG) framework, which can systematically incorporate ideas from well-performing first-order methods into the design of policy optimization algorithms for multi-objective MDP problems, is proposed.

## References

SHOWING 1-10 OF 50 REFERENCES

### Learning all optimal policies with multiple criteria

- Computer ScienceICML '08
- 2008

The algorithm can be viewed as an extension to standard reinforcement learning for MDPs where instead of repeatedly backing up maximal expected rewards, it back up the set of expected rewards that are maximal for some set of linear preferences.

### The Steering Approach for Multi-Criteria Reinforcement Learning

- Computer ScienceNIPS
- 2001

An algorithm for achieving this task, which is based on the theory of approachability for stochastic games, is devised, in an appropriate way, a finite set of standard, scalar-reward learning algorithms.

### Dynamic preferences in multi-criteria reinforcement learning

- Computer Science, EconomicsICML
- 2005

This paper considers the problem of learning in the presence of time-varying preferences among multiple objectives, using numeric weights to represent their importance, and proposes a method that allows us to store a finite number of policies, choose an appropriate policy for any weight vector and improve upon it.

### Tree-based Fitted Q-iteration for Multi-Objective Markov Decision problems

- Computer ScienceThe 2012 International Joint Conference on Neural Networks (IJCNN)
- 2012

The algorithm proposed in this paper is an extension of Fitted Q-iteration (FQI) that enables to learn the control policies for all the linear combinations of preferences assigned to the objectives in a single training process.

### Dynamic Weights in Multi-Objective Deep Reinforcement Learning

- Computer ScienceICML
- 2019

This work proposes a multi-objective Q-network whose outputs are conditioned on the relative importance of objectives, and introduces Diverse Experience Replay (DER) to counter the inherent non-stationarity of the dynamic weights setting.

### Manifold-based multi-objective policy search with sample reuse

- Computer ScienceNeurocomputing
- 2017

### Continuous Deep Q-Learning with Model-based Acceleration

- Computer ScienceICML
- 2016

This paper derives a continuous variant of the Q-learning algorithm, which it is called normalized advantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods, and substantially improves performance on a set of simulated robotic control tasks.

### Multiobjective Reinforcement Learning: A Comprehensive Overview

- Computer ScienceIEEE Transactions on Systems, Man, and Cybernetics: Systems
- 2015

The basic architecture, research topics, and naïve solutions of MORL are introduced at first and several representative MORL approaches and some important directions of recent research are comprehensively reviewed.

### Continuous control with deep reinforcement learning

- Computer ScienceICLR
- 2016

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

### Parallel reinforcement learning for weighted multi-criteria model with adaptive margin

- Computer ScienceCognitive Neurodynamics
- 2008

Although an algorithm exists for calculating the equivalent result to Q-learning for each task simultaneously, it presents the problem of explosion of set sizes and is introduced to overcome this difficulty.