• Corpus ID: 220425420

Collapsing Bandits and Their Application to Public Health Interventions

@article{Mate2020CollapsingBA,
  title={Collapsing Bandits and Their Application to Public Health Interventions},
  author={Aditya Mate and J. Killian and Haifeng Xu and A. Perrault and Milind Tambe},
  journal={ArXiv},
  year={2020},
  volume={abs/2007.04432}
}
We propose and study Collpasing Bandits, a new restless multi-armed bandit (RMAB) setting in which each arm follows a binary-state Markovian process with a special structure: when an arm is played, the state is fully observed, thus "collapsing" any uncertainty, but when an arm is passive, no observation is made, thus allowing uncertainty to evolve. The goal is to keep as many arms in the "good" state as possible by planning a limited budget of actions per round. Such Collapsing Bandits are… 

Figures from this paper

Risk-Aware Interventions in Public Health: Planning with Restless Multi-Armed Bandits

An RMAB solution to HMIPs is developed that allows for reward functions that are monotone increasing, rather than linear, in the belief state and also supports a wider class of observations and proves theoretical guarantees on the asymptotic optimality of the algorithm for any arbitrary reward function.

Learning Index Policies for Restless Bandits with Application to Maternal Healthcare

This work proposes a mechanism for the problem of balancing the explore-exploit trade-off, and finds that the proposed mechanism outperforms the baseline intervention scheme maternal healthcare dataset.

Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare

This paper proposes a Whittle index based Q-Learning mechanism and shows that it converges to the optimal solution of a restless multi-armed bandit (RMAB) problem, where each beneficiary is assumed to transition from one state to another depending on the intervention.

Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems

A novel and efficient algorithm to compute the index-based solution for Streaming Bandits, and proves a phenomenon that is called index decay — whereby the Whittle index values are low for short residual lifetimes — driving the intuition underpinning the algorithm.

Networked Restless Multi-Armed Bandits for Mobile Interventions

This work proposes a new solution approach for networked RMABs, exploiting concavity properties which arise under natural assumptions on the structure of intervention effects, and demonstrates that it empirically outperforms state-of-the art baselines in three mobile intervention domains using real-world graphs.

Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health

This paper proposes a novel approach for decision-focused learning in RMAB that directly trains the predictive model to maximize the Whittle index solution quality, and establishes differentiability of theWhittle index policy to support decision- focused learning.

Efficient Resource Allocation with Fairness Constraints in Restless Multi-Armed Bandits

Key theoretical properties of fair RMAB are demonstrated and it is experimentally demonstrated that the proposed methods handle fairness constraints without sacrificing signi ficantly on solution quality.

Towards Soft Fairness in Restless Multi-Armed Bandits

The approach incorporates softmax based value iteration method in the RMAB setting to design selection algorithms that manage to satisfy the proposed fairness constraint and provides theoretical performance guarantees and is asymptotically optimal.

Efficient Resource Allocation with Fairness Constraints in Restless Multi-Armed Bandits

Key theoretical properties of fair RMAB are demonstrated and it is experimentally demonstrated that the proposed methods handle fairness constraints without sacrificing significantly on solution quality.

Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Care Domain

A novel approach for decision-focused learning in RMAB that directly trains the predictive model to maximize the Whittle index solution quality is proposed, and it is observed that two-stage learning consistently converges to a slightly smaller predictive loss, while DF-Whittle outperforms two- stage on all solution quality evaluation metrics.

References

SHOWING 1-10 OF 38 REFERENCES

INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS

  • S. Villar
  • Computer Science
    Probability in the Engineering and Informational Sciences
  • 2015
It is shown that the proposed Whittle index rule is optimal for the problem under study in the case of stochastically heterogenous arms under the expected total criterion, and it is further recovered by a simple tractable rule referred to as the 1-limited Round Robin rule.

On the Whittle Index for Restless Multiarmed Hidden Markov Bandits

This work analyze the single-armed bandit and shows that, in general, it admits an approximate threshold-type optimal policy when there is a positive reward for the “no-sample” action and identifies several special cases for which the threshold policy is indeed the optimal policy.

Restless bandits with controlled restarts: Indexability and computation of Whittle index

This work presents detailed numerical experiments which suggest that Whittle index policy performs close to the optimal policy and performs significantly better than myopic policy, which is a commonly used heuristic.

Some indexable families of restless bandit problems

In 1988 Whittle introduced an important but intractable class of restless bandit problems which generalise the multiarmed bandit problems of Gittins by allowing state evolution for passive projects.

ON AN INDEX POLICY FOR RESTLESS BANDITS

We investigate the optimal allocation of effort to a collection of n projects. The projects are ‘restless' in that the state of a project evolves in time, whether or not it is allocated effort. The

Restless bandits: activity allocation in a changing world

We consider a population of n projects which in general continue to evolve whether in operation or not (although by different rules). It is desired to choose the projects in operation at each instant

Restless Poachers: Handling Exploration-Exploitation Tradeoffs in Security Domains

This paper forms the problem as a restless multi-armed bandit (RMAB) model, providing two sufficient conditions for indexability and an algorithm to numerically evaluate indexability, and proposes a binary search based algorithm to find Whittle index policy efficiently.

Selective sensing of a heterogeneous population of units with dynamic health conditions

A fully integrated prognosis-driven selective sensing method that integrates prognostic models, collaborative learning, and sensing resource allocation to efficiently and economically monitor a large number of units by exploiting the similarity between them is developed.

Optimality of myopic scheduling and whittle indexability for energy harvesting sensors

It is shown that in some special cases, a myopic (or greedy) scheduling policy is optimal, and that such a policy coincides with the so called Whittle index policy.

Markov Decision Processes: Discrete Stochastic Dynamic Programming

  • M. Puterman
  • Computer Science
    Wiley Series in Probability and Statistics
  • 1994
Markov Decision Processes covers recent research advances in such areas as countable state space models with average reward criterion, constrained models, and models with risk sensitive optimality criteria, and explores several topics that have received little or no attention in other books.