Multi-Armed Bandits: Theory and Applications to Online Learning in Networks

@inproceedings{Zhao2019MultiArmedBT,
title={Multi-Armed Bandits: Theory and Applications to Online Learning in Networks},
author={Qing Zhao},
booktitle={Multi-Armed Bandits},
year={2019}
}
• Qing Zhao
• Published in Multi-Armed Bandits 21 November 2019
• Computer Science
Abstract Multi-armed bandit problems pertain to optimal sequential decision making and learning in unknown environments. Since the first bandit problem posed by Thompson in 1933 for the application...
21 Citations
• Computer Science
ArXiv
• 2020
An overview of new challenges and representative results on distributed no-regret learning in multi-agent systems modeled as repeated unknown games is given.
• Computer Science
ArXiv
• 2020
Learning algorithms based on the UCB principle are developed which utilize these additional side observations appropriately while performing exploration-exploitation trade-off in the classical multi-armed bandit problem.
• Computer Science
IEEE Transactions on Signal Processing
• 2021
This work appears to be the first on memory-constrained bandit problems in the adversarial setting using a hierarchical learning framework that offers a sequence of operating points on the tradeoff curve between the regret order and memory complexity.
• Computer Science
IEEE Wireless Communications Letters
• 2022
This work considers a wireless network with a source periodically generating time-sensitive information and transmitting it to a destination via one of ${N}$ non-stationary orthogonal wireless channels, in which the lower bound on the AoI regret achievable by any policy is derived.
A new fast-pivoting algorithm is obtained that computes the n Whittle index values of an n-state restless bandit by performing, after an initialization stage, n steps that entail (2/3)n3+O(n2) arithmetic operations.
It is shown that a neural network that produces the Whittle index is also one that producing the optimal control for a set of Markov decision problems, which motivates using deep reinforcement learning for the training of NeurWIN.
• Computer Science
NeurIPS
• 2021
It is shown that a neural network that produces the Whittle index is also one that producing the optimal control for a set of Markov decision problems, which motivates using deep reinforcement learning for the training of NeurWIN.
• Computer Science
NeurIPS
• 2021
This work proves an Õ( √ γN/N) bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds and is order optimal up to logarithmic factors for the cases where a lower bound on regret is known.
• Computer Science
AISTATS
• 2021
General bounds on $\gamma_T$ are provided based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on $T$ and consequently the regret bounds relying on $gamma-T$ under numerous settings are provided.

References

SHOWING 1-10 OF 129 REFERENCES

The general multi-armed bandit problem is reformulated and solved as a control problem over a partially ordered set to provide a technically convenient framework for bandit-like problems and adds insight to the structure of strategies over partially ordered sets.
• Computer Science, Mathematics
Math. Oper. Res.
• 1986
It is shown that when the state space is finite the computation of the dynamic allocation indices can be handled by linear programming methods.
An algorithm, called RA-UCB, is provided to solve a variant of the standard stochastic multi-armed bandit problem when one is not interested in the arm with the best mean, but instead in the hand maximizing some coherent risk measure criterion.
• Computer Science
52nd IEEE Conference on Decision and Control
• 2013
This work derives regret bounds on the expected reward in such a bandit problem using a modification of the well-known upper confidence bound algorithm UCB1.
This chapter examines a number of extensions of the multi-armed bandit framework. We consider the possibility of an infinite number of available arms, we give conditions under which the Gittins index
• Computer Science
2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
• 2011
This work develops an original approach to the RMAB problem that is applicable when the corresponding Bayesian problem has the structure that the optimal solution is one of a prescribed finite set of policies, and develops a novel sensing policy for opportunistic spectrum access over unknown dynamic channels.
• Computer Science
IEEE Transactions on Automatic Control
• 2005
An extension of the traditional two-armed bandit problem is considered, in which the decision maker has access to some side information before deciding which arm to pull and how much the additional information helps is quantified.
• Computer Science, Mathematics
J. Mach. Learn. Res.
• 2003
This work considers the Multi-armed bandit problem under the PAC (“probably approximately correct”) model and generalizes the lower bound to a Bayesian setting, and to the case where the statistics of the arms are known but the identities of the Arms are not.
• Computer Science
2012 IEEE 51st IEEE Conference on Decision and Control (CDC)
• 2012
An online index-based learning policy called dUCB4 algorithm is proposed that trades off exploration v. exploitation in the right way, and achieves expected regret that grows at most near-O(log2 T).