Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams

@inproceedings{Biyik2022PartnerAwareAI,
  title={Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams},
  author={Erdem Biyik and Anusha Lalitha and Rajarshi Saha and Andrea J. Goldsmith and Dorsa Sadigh},
  booktitle={AAAI},
  year={2022}
}
When humans collaborate with each other, they often make decisions by observing others and considering the consequences that their actions may have on the entire team, instead of greedily doing what is best for just themselves. We would like our AI agents to effectively collaborate in a similar way by capturing a model of their partners. In this work, we propose and analyze a decentralized Multi-Armed Bandit (MAB) problem with coupled rewards as an abstraction of more general multi-agent… 

Learning to Advise Humans By Leveraging Algorithm Discretion

Evaluations on synthetic and real-world benchmark datasets with a variety of simulated human accuracy and discretion behaviors show that TR robustly improves the team’s objective across settings over interpretable, rule-based alternatives.

References

SHOWING 1-10 OF 28 REFERENCES

Social Learning in Multi Agent Multi Armed Bandits

A novel algorithm in which agents, whenever they choose, communicate only arm-ids and not samples, with another agent chosen uniformly and independently at random is developed, demonstrating that even a minimal level of collaboration among the different agents enables a significant reduction in per-agent regret.

Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms

This work rigorously characterize the influence of the communication graph structure on the decision-making performance of the group and proves the performance of state-of-the-art frequentist and Bayesian algorithms for cooperative distributed algorithms for multi-agent MAB problems in which agents communicate according to a fixed network graph.

Decentralized Cooperative Stochastic Multi-armed Bandits

This work designs a fully decentralized algorithm that uses a running consensus procedure to compute, with some delay, accurate estimations of the average of rewards obtained by all the agents for each arm, and then uses an upper confidence bound algorithm that accounts for the delay and error of the estimations.

Communicating with Unknown Teammates

This research tackles the problem of communication in ad hoc teams, introducing a minimal version of the multiagent, multi-armed bandit problem with limited communication between the agents.

Multi-armed bandits in multi-agent networks

This paper addresses the multi-armed bandit problem in a multi-player framework with a distributed variant of the well-known UCB1 algorithm that is optimal in the sense that in a complete network it scales down the regret of its single-player counterpart by the network size.

Learning from My Partner's Actions: Roles in Decentralized Robot Teams

This work defines separate roles for each agent in a team of robots, so that teammates can correctly interpret the meaning behind their partner's actions and suggest that leveraging and alternating roles leads to performance comparable to teams that explicitly exchange messages.

Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem

An algorithm is designed for each agent to maximize its own expected cumulative reward and performance bounds that depend on the sociability of the agents and the network structure are proved.

Cheap but Clever: Human Active Learning in a Bandit Setting

This work examines human behavioral data in a multi-armed bandit setting and finds that the knowledge gradient algorithm, which combines exact Bayesian learning with a decision policy that maximizes a combination of immediate reward gain and longterm knowledge gain, captures subjects’ trial-by-trial choice best.

Leveraging Observations in Bandits: Between Risks and Benefits

This paper introduces a new bandit optimism modifier that uses conditional optimism contingent on the actions of the target in order to guide the agent’s exploration and analyzes the effect of this modification on the well-known Upper Confidence Bound algorithm.

Learning Latent Representations to Influence Multi-Agent Interaction

This work proposes a reinforcement learning-based framework for learning latent representations of an agent's policy, where the ego agent identifies the relationship between its behavior and the other agent's future strategy and leverages these latent dynamics to influence the otherAgent, purposely guiding them towards policies suitable for co-adaptation.