Corpus ID: 221081510

Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning with Average and Discounted Rewards

  title={Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning with Average and Discounted Rewards},
  author={Umer Siddique and Paul Weng and Matthieu Zimmer},
As the operations of autonomous systems generally affect simultaneously several users, it is crucial that their designs account for fairness considerations. In contrast to standard (deep) reinforcement learning (RL), we investigate the problem of learning a policy that treats its users equitably. In this paper, we formulate this novel RL problem, in which an objective function, which encodes a notion of fairness that we formally define, is optimized. For this problem, we provide a theoretical… Expand
Fairness for Cooperative Multi-Agent Learning with Equivariant Policies
This work introduces team fairness, a group-based fairness measure for multi-agent learning, and incorporates team fairness into policy optimization by introducing Fairness through Equivariance (Fair-E), a novel learning strategy that achieves provably fair reward distributions. Expand
A Practical Guide to Multi-Objective Reinforcement Learning and Planning
This paper identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. Expand
Cooperative Multi-Agent Fairness and Equivariant Policies
It is proved that it is possible to enforce team fairness during policy optimization by transforming the team’s joint policy into an equivariant map and Fairness through Equivariance Regularization is introduced as a soft-constraint version of Fair-E and shows that it reaches higher levels of utility than Fair- E and fairer outcomes than non-equivariant policies. Expand
Towards Return Parity in Markov Decision Processes
This work proposes return parity, a fairness notion that requires MDPs from different demographic groups that share the same state and action spaces to achieve approximately the same expected time-discounted rewards. Expand
Fairness : From Static to Dynamic
  • Dell Zhang
  • 2021
Driven by the need to capture users’ evolving interests and optimize their long-term experiences, more and more recommender systems have started to model recommendation as a Markov decision processExpand
An Axiomatic Theory of Provably-Fair Welfare-Centric Machine Learning
This work defines a complementary measure, termed malfare, measuring overall societal harm (rather than wellbeing), with axiomatic justification via the standard axioms of cardinal welfare, and casts fair machine learning as malfare minimization over the risk values (expected losses) of each group. Expand
Welfare-based Fairness through Optimization
It is argued that optimization models allow formulation of a wide range of fairness criteria as social welfare functions, while enabling AI to take advantage of highly advanced solution technology, and supports a broad perspective on fairness motivated by general distributive justice considerations. Expand
A Guide to Formulating Equity and Fairness in an Optimization Model
Optimization models typically seek to maximize overall benefit or minimize total cost. Yet equity and fairness are important elements of many practical decisions, and it is much less obvious how toExpand
Popcorn: Human-in-the-loop Popularity Debiasing in Conversational Recommender Systems
This paper proposes a human-in-the-loop popularity debiasing framework that integrates real-time semantic understanding of open-ended user utterances as well as historical records, while also effectively managing the dialogue with the user. Expand
A Sociotechnical View of Algorithmic Fairness
Overall, 280 articles from various disciplines and including diverse viewpoints form the basis of this critical review: 166 in the conference set and 114 in the multidisciplinary set. Expand


Optimizing Average Reward Using Discounted Rewards
  • S. Kakade
  • Computer Science, Mathematics
  • 2001
A bound is provided on the average reward of the policy obtained by solving the Bellman equations which depends on the relationship between the discount factor and the mixingtime of the Markov chain. Expand
Fairness in Reinforcement Learning
The study of fairness in reinforcement learning is initiated, and a provably fair polynomial time algorithm is provided under an approximate notion of fairness, thus establishing an exponential gap between exact and approximate fairness. Expand
Near-optimal Regret Bounds for Reinforcement Learning
This work presents a reinforcement learning algorithm with total regret O(DS√AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D, and proposes a new parameter: An MDP has diameter D if for any pair of states s,s' there is a policy which moves from s to s' in at most D steps. Expand
Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives
A no-regret algorithm based on the Frank-Wolfe algorithm, UCRL2, and Jaksch et al. 2010, as well as a crucial and novel gradient threshold procedure that returns a non-stationary policy that diversifies the outcomes for optimizing the objectives are proposed. Expand
Infinite-Horizon Policy-Gradient Estimation
GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes (POMDPs) controlled by parameterized stochastic policies, is introduced. Expand
Learning Fairness in Multi-Agent Systems
This work proposes FEN, a novel hierarchical reinforcement learning model that easily learns both fairness and efficiency and significantly outperforms baselines in a variety of multi-agent scenarios. Expand
Policy Gradient Methods for Reinforcement Learning with Function Approximation
This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy. Expand
Proximal Policy Optimization Algorithms
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objectiveExpand
Human-level control through deep reinforcement learning
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks. Expand
Approximation of Lorenz-Optimal Solutions in Multiobjective Markov Decision Processes
Methods to efficiently approximate the sets of Lorenz-non-dominated solutions of infinite-horizon, discounted MOMDPs are introduced, which are polynomial-sized subsets of those solutions. Expand