Corpus ID: 235435648

# Planning to Fairly Allocate: Probabilistic Fairness in the Restless Bandit Setting

@article{Herlihy2021PlanningTF,
title={Planning to Fairly Allocate: Probabilistic Fairness in the Restless Bandit Setting},
author={Christine Herlihy and Aviva Prins and Aravind Srinivasan and John P. Dickerson},
journal={ArXiv},
year={2021},
volume={abs/2106.07677}
}
Restless and collapsing bandits are commonly used to model constrained resource allocation in settings featuring arms with action-dependent transition probabilities, such as allocating health interventions among patients [Whittle, 1988; Mate et al., 2020]. However, state-of-the-art Whittleindex-based approaches to this planning problem either do not consider fairness among arms, or incentivize fairness without guaranteeing it [Mate et al., 2021]. Additionally, their optimality guarantees only… Expand

#### References

SHOWING 1-10 OF 51 REFERENCES
Combinatorial Sleeping Bandits with Fairness Constraints
• Computer Science, Mathematics
• IEEE INFOCOM 2019 - IEEE Conference on Computer Communications
• 2019
A new Combinatorial Sleeping multi-armed bandit model with Fairness constraints, called CSMAB-F, is proposed, aiming to address the aforementioned crucial modeling issues and rigorously proves that not only LFG is feasibility-optimal but it also has a time-average regret upper bounded. Expand
Fairness in Learning: Classic and Contextual Bandits
• Computer Science, Mathematics
• NIPS
• 2016
A tight connection between fairness and the KWIK (Knows What It Knows) learning model is proved: a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and a worst-case exponential gap in regret between fair and non-fair learning algorithms. Expand
Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access
• Computer Science, Mathematics
• IEEE Transactions on Information Theory
• 2010
This work establishes the indexability and obviates the need to know the Markov transition probabilities in Whittle index policy, and develops efficient algorithms for computing a performance upper bound given by Lagrangian relaxation. Expand
Fair Contextual Multi-Armed Bandits: Theory and Experiments
• Computer Science, Mathematics
• UAI
• 2020
A Multi-Armed Bandit algorithm with fairness constraints is introduced, where fairness is defined as a minimum rate that a task or a resource is assigned to a user. Expand
ON AN INDEX POLICY FOR RESTLESS BANDITS
We investigate the optimal allocation of effort to a collection of n projects. The projects are 'restless' in that the state of a project evolves in time, whether or not it is allocated effort. TheExpand
Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems
• Computer Science
• NeurIPS
• 2019
This paper analyzes the performance of Thompson sampling in episodic restless bandits with unknown parameters and considers a general policy map to define a competitor and prove an $\tilde{\mathcal{O}}(\sqrt{T})$ Bayesian regret bound. Expand
Collapsing Bandits and Their Application to Public Health Interventions
• Computer Science, Mathematics
• NeurIPS
• 2020
A new restless multi-armed bandit setting in which each arm follows a binary-state Markovian process with a special structure, "collapsing" any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. Expand
Incorporating Healthcare Motivated Constraints in Restless Bandit Based Resource Allocation
As reinforcement learning plays an increasingly important role in healthcare, there is a pressing need to identify mechanisms to incorporate practitioner expertise. One notable case is in improvingExpand
Risk-Aware Interventions in Public Health: Planning with Restless Multi-Armed Bandits
• Computer Science
• AAMAS
• 2021
An RMAB solution to HMIPs is developed that allows for reward functions that are monotone increasing, rather than linear, in the belief state and also supports a wider class of observations and proves theoretical guarantees on the asymptotic optimality of the algorithm for any arbitrary reward function. Expand
Restless Bandits: Activity Allocation in a Changing World
We consider a population of n projects which in general continue to evolve whether in operation or not (although by different rules). It is desired to choose the projects in operation at each instantExpand