• Corpus ID: 210860791

Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting

  title={Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting},
  author={Zixin Zhong and Wang Chi Cheung and Vincent Yan Fu Tan},
  booktitle={International Conference on Machine Learning},
We design and analyze CascadeBAI, an algorithm for finding the best set of $K$ items, also called an arm, within the framework of cascading bandits. An upper bound on the time complexity of CascadeBAI is derived by overcoming a crucial analytical challenge, namely, that of probabilistically estimating the amount of available feedback at each step. To do so, we define a new class of random variables (r.v.'s) which we term as left-sided sub-Gaussian r.v.'s; these are r.v.'s whose cumulant… 

Figures and Tables from this paper

Best Arm Identification in Restless Markov Multi-Armed Bandits

A sequential policy that forcibly selects an arm that has not been selected for R consecutive time instants is proposed, and it is shown that this policy achieves an upper bound that depends on R and is monotonically non-increasing as R → ∞ .

Optimal Clustering with Bandit Feedback

Junwen Yang Institute of Operations Research and Analytics, National University of Singapore, Singapore 117602, junwen yang@u.nus.edu Zixin Zhong Department of Electrical and Computer Engineering,

Fast Pure Exploration via Frank-Wolfe

Frank-Wolfe-based Sampling (FWS) is devised, a simple algorithm whose sample complexity matches the lower bounds for a wide class of pure exploration problems and is competitive compared to state-of-art algorithms.

Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback

The first polynomial-time adaptive algorithm is designed, which simultaneously addresses limited feedback, general reward function and combinatorial action space (e.g., matroids, matchings and s-t paths), and provides its sample complexity analysis.

Probabilistic Sequential Shrinking: A Best Arm Identification Algorithm for Stochastic Bandits with Corruptions

A novel randomized algorithm, Probabilistic Sequential Shrinking (PSS), which is agnostic to the amount of corruptions, and has a better performance than its deterministic analogue, the Successive Halving algorithm by Karnin et al. (2013).

Combinatorial Pure Exploration with Full-bandit Feedback and Beyond: Solving Combinatorial Optimization under Uncertainty with Limited Observation

Recently proposed techniques for combinatorial pure exploration problems with limited feedback for multi-armed bandits with semi-bandit feedback are reviewed.



Top Arm Identification in Multi-Armed Bandits with Batch Arm Pulls

This paper develops and analyzes algorithms for batch MABs and top arm identification, for both fixed confidence and fixed budget settings, and shows that the batch constraint does not significantly affect the sample complexity of top arms identification compared to unconstrained MAB algorithms.

PAC Bounds for Multi-armed Bandit and Markov Decision Processes

The bandit problem is revisited and considered under the PAC model, and it is shown that given n arms, it suffices to pull the arms O(n/?2 log 1/?) times to find an ?-optimal arm with probability of at least 1 - ?.

A Variant of Azuma's Inequality for Martingales with Subgaussian Tails

We provide a variant of Azuma's concentration inequality for martingales, in which the standard boundedness requirement is replaced by the milder requirement of a subgaussian tail.

Gamification of Pure Exploration for Linear Bandits

This work designs the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits, which naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly.

Polynomial-Time Algorithms for Multiple-Arm Identification with Full-Bandit Feedback

This study designs a polynomial-time approximation algorithm for a 0-1 quadratic programming problem arising in confidence ellipsoid maximization and proposes a bandit algorithm whose computation time is O(log K), thereby achieving an exponential speedup over linear bandit algorithms.

Combinatorial Bandits with Full-Bandit Feedback: Sample Complexity and Regret Minimization

The CSAR algorithm is presented, which is a generalization of the SAR algorithm (Bubeck et al. 2013) for the combinatorial setting, and an efficient sampling scheme that uses Hadamard matrices in order to estimate accurately the individual arms' expected rewards is presented.

Top-$k$ Combinatorial Bandits with Full-Bandit Feedback

This work presents the Combinatorial Successive Accepts and Rejects (CSAR) algorithm, which generalizes SAR (Bubeck et al, 2013) for top-k combinatorial bandits, and presents an efficient sampling scheme that uses Hadamard matrices in order to estimate accurately the individual arms' expected rewards.

Primer of Applied Regression & Analysis of Variance

Why Do Multivariate Analysis?. The First Step: Understanding Simple Linear Regression. Regression With Two or More Independent Variables. Do the Data Fit the Assumptions?. Multicollinearity and What

A Thompson Sampling Algorithm for Cascading Bandits

Empirical experiments demonstrate superiority of TS-Cascade compared to existing UCB-based procedures in terms of the expected cumulative regret and the time complexity and the first theoretical guarantee on a Thompson sampling algorithm for any stochastic combinatorial bandit problem model with partial feedback.