# Multi-Armed Bandits: Theory and Applications to Online Learning in Networks

@inproceedings{Zhao2019MultiArmedBT, title={Multi-Armed Bandits: Theory and Applications to Online Learning in Networks}, author={Qing Zhao}, booktitle={Multi-Armed Bandits}, year={2019} }

Abstract Multi-armed bandit problems pertain to optimal sequential decision making and learning in unknown environments. Since the first bandit problem posed by Thompson in 1933 for the application...

## 21 Citations

### Distributed No-Regret Learning in Multi-Agent Systems

- Computer ScienceArXiv
- 2020

An overview of new challenges and representative results on distributed no-regret learning in multi-agent systems modeled as repeated unknown games is given.

### Multi-Armed Bandits with Dependent Arms

- Computer ScienceArXiv
- 2020

Learning algorithms based on the UCB principle are developed which utilize these additional side observations appropriately while performing exploration-exploitation trade-off in the classical multi-armed bandit problem.

### Memory-Constrained No-Regret Learning in Adversarial Multi-Armed Bandits

- Computer ScienceIEEE Transactions on Signal Processing
- 2021

This work appears to be the first on memory-constrained bandit problems in the adversarial setting using a hierarchical learning framework that offers a sequence of operating points on the tradeoff curve between the regret order and memory complexity.

### Regret of Age-of-Information Bandits in Nonstationary Wireless Networks

- Computer ScienceIEEE Wireless Communications Letters
- 2022

This work considers a wireless network with a source periodically generating time-sensitive information and transmitting it to a destination via one of ${N}$ non-stationary orthogonal wireless channels, in which the lower bound on the AoI regret achievable by any policy is derived.

### A Fast-Pivoting Algorithm for Whittle’s Restless Bandit Index

- Computer ScienceMathematics
- 2020

A new fast-pivoting algorithm is obtained that computes the n Whittle index values of an n-state restless bandit by performing, after an initialization stage, n steps that entail (2/3)n3+O(n2) arithmetic operations.

### NEURWIN: NEURAL WHITTLE INDEX NETWORK FOR

- Computer Science
- 2020

It is shown that a neural network that produces the Whittle index is also one that producing the optimal control for a set of Markov decision problems, which motivates using deep reinforcement learning for the training of NeurWIN.

### NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

- Computer ScienceNeurIPS
- 2021

It is shown that a neural network that produces the Whittle index is also one that producing the optimal control for a set of Markov decision problems, which motivates using deep reinforcement learning for the training of NeurWIN.

### Optimal Order Simple Regret for Gaussian Process Bandits

- Computer ScienceNeurIPS
- 2021

This work proves an Õ( √ γN/N) bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds and is order optimal up to logarithmic factors for the cases where a lower bound on regret is known.

### On Information Gain and Regret Bounds in Gaussian Process Bandits

- Computer ScienceAISTATS
- 2021

General bounds on $\gamma_T$ are provided based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on $T$ and consequently the regret bounds relying on $gamma-T$ under numerous settings are provided.

## References

SHOWING 1-10 OF 129 REFERENCES

### Discrete multi-armed bandits and multi-parameter processes

- Computer Science, Materials Science
- 1986

The general multi-armed bandit problem is reformulated and solved as a control problem over a partially ordered set to provide a technically convenient framework for bandit-like problems and adds insight to the structure of strategies over partially ordered sets.

### Linear Programming for Finite State Multi-Armed Bandit Problems

- Computer Science, MathematicsMath. Oper. Res.
- 1986

It is shown that when the state space is finite the computation of the dynamic allocation indices can be handled by linear programming methods.

### Robust Risk-Averse Stochastic Multi-armed Bandits

- Computer ScienceALT
- 2013

An algorithm, called RA-UCB, is provided to solve a variant of the standard stochastic multi-armed bandit problem when one is not interested in the arm with the best mean, but instead in the hand maximizing some coherent risk measure criterion.

### Bandits with budgets

- Computer Science52nd IEEE Conference on Decision and Control
- 2013

This work derives regret bounds on the expected reward in such a bandit problem using a modification of the well-known upper confidence bound algorithm UCB1.

### Generalized Bandit Problems

- Mathematics
- 2005

This chapter examines a number of extensions of the multi-armed bandit framework. We consider the possibility of an infinite number of available arms, we give conditions under which the Gittins index…

### The non-Bayesian restless multi-armed bandit: A case of near-logarithmic regret

- Computer Science2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2011

This work develops an original approach to the RMAB problem that is applicable when the corresponding Bayesian problem has the structure that the optimal solution is one of a prescribed finite set of policies, and develops a novel sensing policy for opportunistic spectrum access over unknown dynamic channels.

### Bandit problems with side observations

- Computer ScienceIEEE Transactions on Automatic Control
- 2005

An extension of the traditional two-armed bandit problem is considered, in which the decision maker has access to some side information before deciding which arm to pull and how much the additional information helps is quantified.

### The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2003

This work considers the Multi-armed bandit problem under the PAC (“probably approximately correct”) model and generalizes the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the Arms are not.

### Decentralized learning for multi-player multi-armed bandits

- Computer Science2012 IEEE 51st IEEE Conference on Decision and Control (CDC)
- 2012

An online index-based learning policy called dUCB4 algorithm is proposed that trades off exploration v. exploitation in the right way, and achieves expected regret that grows at most near-O(log2 T).