Corpus ID: 237562990

Online Learning of Network Bottlenecks via Minimax Paths

@article{kerblom2021OnlineLO,
  title={Online Learning of Network Bottlenecks via Minimax Paths},
  author={Niklas {\AA}kerblom and Fazeleh Sadat Hoseini and Morteza Haghir Chehreghani},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.08467}
}
In this paper, we study bottleneck identification in networks via extracting minimax paths. Many real-world networks have stochastic weights for which full knowledge is not available in advance. Therefore, we model this task as a combinatorial semi-bandit problem to which we apply a combinatorial version of Thompson Sampling and establish an upper bound on the corresponding Bayesian regret. Due to the computational intractability of the problem, we then devise an alternative problem formulation… Expand
1 Citations

Figures and Tables from this paper

Online Learning of Energy Consumption for Navigation of Electric Vehicles
TLDR
This work employs a Bayesian approach to model the energy consumption at road segments for efficient navigation in electric vehicles and develops an online learning framework and investigates several exploration strategies such as Thompson Sampling and Upper Confidence Bound. Expand

References

SHOWING 1-10 OF 36 REFERENCES
Learning to Optimize via Posterior Sampling
TLDR
A Bayesian regret bound for posterior sampling is made that applies broadly and can be specialized to many model classes and depends on a new notion the authors refer to as the eluder dimension, which measures the degree of dependence among action rewards. Expand
Adaptive shortest-path routing under unknown and stochastically varying link states
  • K. Liu, Qing Zhao
  • Computer Science
  • 2012 10th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt)
  • 2012
TLDR
By exploiting arm dependencies, a regret polynomial with the network size can be achieved while maintaining the optimal logarithmic order with time and find applications in cognitive radio and ad hoc networks with unknown and dynamic communication environments. Expand
Online shortest path routing: The value of information
TLDR
This paper studies online shortest path routing over dynamic multi-hop networks as a combinatorial bandit optimization problem and derives the tight asymptotic lower bound on the regret that has to be satisfied by any online routing policy. Expand
Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis
TLDR
The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem is answered positively for the case of Bernoulli rewards by providing the first finite-time analysis that matches the asymptotic rate given in the Lai and Robbins lower bound for the cumulative regret. Expand
Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations
TLDR
New efficient policies are shown to achieve regret that grows logarithmically with time, and polynomially in the number of unknown variables, for this combinatorial multi-armed bandit problem. Expand
Combinatorial Multi-Armed Bandit with General Reward Functions
TLDR
A new algorithm called stochastic combinatorial multi-armed bandit (CMAB) framework is studied, which allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables. Expand
Thompson Sampling for Combinatorial Semi-Bandits
  • Siwei Wang, Wei Chen
  • Computer Science, Mathematics
  • ICML
  • 2018
TLDR
The first distribution-dependent regret bound of O(mK_{\max}\log T / \Delta_{\min}) is obtained, and it is shown that one cannot directly replace the exact offline oracle with an approximation oracle in TS algorithm for even the classical MAB problem. Expand
On Bayesian Upper Confidence Bounds for Bandit Problems
TLDR
It is proved that the corresponding algorithm, termed BayesUCB, satisfies finite-time regret bounds that imply its asymptotic optimality and gives a general formulation for a class of Bayesian index policies that rely on quantiles of the posterior distribution. Expand
Combinatorial Bandits
TLDR
A variant of a strategy by Dani, Hayes and Kakade achieving a regret bound that, for a variety of concrete choices of S, is of order ndln|S| where n is the time horizon is introduced. Expand
Analysis of Thompson Sampling for the Multi-armed Bandit Problem
TLDR
For the first time, it is shown that Thompson Sampling algorithm achieves logarithmic expected regret for the stochastic multi-armed bandit problem. Expand
...
1
2
3
4
...