• Corpus ID: 201319046

A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation

  title={A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation},
  author={Runzhe Yang and Xingyuan Sun and Karthik Narasimhan},
We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks. In MORL, the aim is to learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent. While this alleviates dependence on scalar reward design, the expected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce… 

Provable Multi-Objective Reinforcement Learning with Generative Models

This work proposes a new algorithm called model-based envelop value iteration (EVI), which generalizes the enveloped multi-objective $Q$-learning algorithm in Yang, 2019 and can learn a near-optimal value function with polynomial sample complexity and linear convergence speed.

PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm

The proposed approach, Preference-Driven MORL (PD-MORL), utilizes the preferences as guidance to update the network parameters, and is evaluated on challenging multi-objective continuous control tasks.

gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning Approach

Generalized Thresholded Lexicographic Ordering (gTLO), a novel method that aims to combine non-linear MORL with the advantages of generalized MORL, is proposed and a deep reinforcement learning realization of the algorithm is introduced.

A Distributional View on Multi-Objective Policy Optimization

This paper proposes a novel algorithm for multi-objective reinforcement learning that enables setting desired preferences for objectives in a scale-invariant way, and uses supervised learning to fit a parametric policy to a combination of these distributions.

Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer

An SF-based extension of the Optimistic Linear Support algorithm is introduced to learn a set of policies whose SFs form a convex coverage set and it is proved that policies in this set can be combined via generalized policy improvement to construct optimal behaviors for any new linearly-expressible tasks, without requiring any additional training samples.

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning

The principles underlying MORL are studied and a new algorithm, Distillation of a Mixture of Experts (DiME), is introduced that is intuitive and scale-invariant under some conditions and outperforms state-of-the-art on two established offline RL benchmarks.

Influence-based Reinforcement Learning for Intrinsically-motivated Agents

This work presents an algorithmic framework of two RL agents each with a different objective, which introduces a novel function approximation approach to assess the influence F of a certain policy on others, and achieves the exploration criterion.

Multi-Objective Reinforcement Learning with Non-Linear Scalarization

This paper considers the problem of MORL where multiple objectives are combined using a non-linear scalarization, and proposes a solution using steady-state occupancy measures and long-term average rewards to maximize the scalarized objective.

Pareto Policy Adaptation

This work introduces Pareto Policy Adaptation ( PPA), a loss function that adapts the policy to be optimal with respect to any distribution over preferences, and uses implicit differentiation to back-propagate the loss gradient bypassing the operations of the projected gradient descent solver.

Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

An Anchor-changing Regularized Natural Policy Gradient (ARNPG) framework, which can systematically incorporate ideas from well-performing first-order methods into the design of policy optimization algorithms for multi-objective MDP problems, is proposed.



Learning all optimal policies with multiple criteria

The algorithm can be viewed as an extension to standard reinforcement learning for MDPs where instead of repeatedly backing up maximal expected rewards, it back up the set of expected rewards that are maximal for some set of linear preferences.

The Steering Approach for Multi-Criteria Reinforcement Learning

An algorithm for achieving this task, which is based on the theory of approachability for stochastic games, is devised, in an appropriate way, a finite set of standard, scalar-reward learning algorithms.

Dynamic preferences in multi-criteria reinforcement learning

This paper considers the problem of learning in the presence of time-varying preferences among multiple objectives, using numeric weights to represent their importance, and proposes a method that allows us to store a finite number of policies, choose an appropriate policy for any weight vector and improve upon it.

Tree-based Fitted Q-iteration for Multi-Objective Markov Decision problems

The algorithm proposed in this paper is an extension of Fitted Q-iteration (FQI) that enables to learn the control policies for all the linear combinations of preferences assigned to the objectives in a single training process.

Dynamic Weights in Multi-Objective Deep Reinforcement Learning

This work proposes a multi-objective Q-network whose outputs are conditioned on the relative importance of objectives, and introduces Diverse Experience Replay (DER) to counter the inherent non-stationarity of the dynamic weights setting.

Continuous Deep Q-Learning with Model-based Acceleration

This paper derives a continuous variant of the Q-learning algorithm, which it is called normalized advantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods, and substantially improves performance on a set of simulated robotic control tasks.

Multiobjective Reinforcement Learning: A Comprehensive Overview

The basic architecture, research topics, and naïve solutions of MORL are introduced at first and several representative MORL approaches and some important directions of recent research are comprehensively reviewed.

Continuous control with deep reinforcement learning

This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Parallel reinforcement learning for weighted multi-criteria model with adaptive margin

Although an algorithm exists for calculating the equivalent result to Q-learning for each task simultaneously, it presents the problem of explosion of set sizes and is introduced to overcome this difficulty.