• Corpus ID: 220646981

Multi-Principal Assistance Games

  title={Multi-Principal Assistance Games},
  author={Arnaud Fickinger and Simon Zhuang and Dylan Hadfield-Menell and Stuart J. Russell},
Assistance games (also known as cooperative inverse reinforcement learning games) have been proposed as a model for beneficial AI, wherein a robotic agent must act on behalf of a human principal but is initially uncertain about the humans payoff function. This paper studies multi-principal assistance games, which cover the more general case in which the robot acts on behalf of N humans who may have widely differing payoffs. Impossibility theorems in social choice theory and voting theory can be… 
Understanding Learned Reward Functions
This paper applies saliency methods to identify failure modes and predict the robustness of reward functions, and discovers that learned reward functions often implement surprising algorithms that rely on contingent aspects of the environment.
Epistemic Defenses against Scientific and Empirical Adversarial AI Attacks
This transdisciplinary analysis suggests that employing distinct explanation-anchored, trustdisentangled and adversarial strategies is one possible principled complementary epistemic defense against SEA AI attacks – albeit with caveats yielding incentives for future work.


Cooperative Inverse Reinforcement Learning
It is shown that computing optimal joint policies in CIRL games can be reduced to solving a POMDP, it is proved that optimality in isolation is suboptimal in C IRL, and an approximate CirL algorithm is derived.
The Assistive Multi-Armed Bandit
This work introduces the assistive multi-armed bandit, where a robot assists a human playing a bandit task to maximize cumulative reward to contribute towards a theory behind algorithms for human-robot interaction.
Reinforcement Learning with Fairness Constraints for Resource Distribution in Human-Robot Teams
This work introduces a multi-armed bandit algorithm with fairness constraints, where a robot distributes resources to human teammates of different skill levels, and defines fairness as a constraint on the minimum rate that each human teammate is selected throughout the task.
Learning to Interactively Learn and Assist
This paper introduces a multi-agent training framework that enables an agent to learn from another agent who knows the current task, and produces an agent that is capable of learning interactively from a human user, without a set of explicit demonstrations or a reward function.
Deriving Consensus in Multiagent Systems
This article examines how the Clarke tax could be used as an effective consensus mechanism in domains consisting of automated agents, and considers how agents can come to a consensus without needing to reveal full information about their preferences, and without need to generate alternatives prior to the voting process.
Fairness in Multi-Agent Sequential Decision-Making
A fairness solution criterion for multi-agent decision-making problems, where agents have local interests, is defined and a simple linear programming approach and a more scalable game-theoretic approach for computing an optimal fairness policy are developed.
Optimal social choice functions: A utilitarian view
This work adopts a utilitarian perspective on social choice, assuming that agents have (possibly latent) utility functions over some space of alternatives and studies optimal social choice functions under three different models, and underscores the important role played by scoring functions.
Inverse Reward Design
This work introduces inverse reward design (IRD) as the problem of inferring the true objective based on the designed reward and the training MDP, and introduces approximate methods for solving IRD problems, and uses their solution to plan risk-averse behavior in test MDPs.
Apprenticeship learning via inverse reinforcement learning
This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.
Straightforwardness of Game Forms with Lotteries as Outcomes
A GAME FORM IS ANY SYSTEM which makes an outcome depend on individual actions of some kind, called strategies. Prime examples are systems of voting. Where voting consists of each person's marking a