• Corpus ID: 67856507

From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model

  title={From PAC to Instance-Optimal Sample Complexity in the Plackett-Luce Model},
  author={Aadirupa Saha and Aditya Gopalan},
  booktitle={International Conference on Machine Learning},
We consider PAC-learning a good item from $k$-subsetwise feedback information sampled from a Plackett-Luce probability model, with instance-dependent sample complexity performance. In the setting where subsets of a fixed size can be tested and top-ranked feedback is made available to the learner, we give an algorithm with optimal instance-dependent sample complexity, for PAC best arm identification, of $O\bigg(\frac{\theta_{[k]}}{k}\sum_{i = 2}^n\max\Big(1,\frac{1}{\Delta_i^2}\Big) \ln\frac{k… 

Figures from this paper

The Sample Complexity of Best-k Items Selection from Pairwise Comparisons

This paper studies the sample complexity (aka number of comparisons) bounds for the active best-$k$ items selection from pairwise comparisons and proposes two algorithms based on PAC best items selection algorithms that works for $k=1 and is sample complexity optimal up to a loglog factor.

On Sample Complexity Upper and Lower Bounds for Exact Ranking from Noisy Comparisons

This paper aims at the exact ranking without knowledge on the instances, while most of the previous works either focus on approximate rankings or study exact ranking but require prior knowledge.

Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences

This work proposes a novel reduction from any (general) dueling bandits to multi-armed bandits and despite the simplicity, it allows us to improve many existing results in Dueling bandits.

Versatile Dueling Bandits: Best-of-both World Analyses for Online Learning from Relative Preferences

The robustness of the proposed algorithm is justified by proving its optimal regret rate under adversarially corrupted preferences—this outperforms the existing state-of-the-art corrupted dueling results by a large margin.

Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget

A generic algorithm suitable to cover the full spectrum of conceivable arm elimination strategies from aggressive to conservative is suggested and theoretical questions about thecient and necessary budget of the algorithm to choose the best arm are answered and complemented by deriving lower bounds for any learning algorithm for this problem scenario.

Sample Complexity Bounds for Active Ranking from Multi-wise Comparisons

Whether and to what degree utilizing multi-wise comparisons can reduce the sample complexity for the ranking problems compared to ranking from pairwise comparisons is helps understand.

Identification of the Generalized Condorcet Winner in Multi-dueling Bandits

The Dvoretzky–Kiefer–Wolfowitz tournament (DKWT) algorithm is proposed, which proves to be nearly optimal and empirically outperforms current state-of-the-art algorithms, even in the special case of dueling bandits or under a Plackett-Luce assumption on the feedback mechanism.

One Arrow, Two Kills: An Unified Framework for Achieving Optimal Regret Guarantees in Sleeping Bandits

This work proposes a new notion of Internal Regret for sleeping MAB, and proposes an algorithm that yields sublinear regret in that measure, even for a completely adversarial sequence of losses and availabilities.

Online Elicitation of Necessarily Optimal Matchings

This paper investigates the elicitation of necessarily Pareto optimal (NPO) and necessarily rank-maximal (NRM) matchings and answers an open question and gives an online algorithm for eliciting an NRM matching in the next-best query model which is 3/2-competitive.

ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits

An elimination-based rescheduling algorithm is developed and shown to be a near-optimal dynamic regret bound, where S CW is the number of times the Condorcet winner changes in T rounds.



PAC Battling Bandits in the Plackett-Luce Model

Two algorithms are proposed for the PAC problem with the TR feedback model with optimal (upto logarithmic factors) sample complexity guarantees, establishing the increase in statistical efficiency from exploiting rank-ordered feedback.

Active Ranking with Subset-wise Preferences

These algorithms rely on a novel {pivot trick} to maintain only $n$ itemwise score estimates, unlike $O(n^2)$ pairwise score estimate estimates that has been used in prior work.

PAC-Battling Bandits with Plackett-Luce: Tradeoff between Sample Complexity and Subset Size

We introduce the probably approximately correct (PAC) version of the problem of Battling-bandits with the Plackett-Luce (PL) model – an online learning framework where in each trial, the learner

PAC Ranking from Pairwise and Listwise Queries: Lower Bounds and Upper Bounds

This paper derives a lower bound on the sample complexity (aka number of queries), and proposes an algorithm that is sample-complexity-optimal up to an $O(\log(k+l)/\log{k})$ factor and designs ranking algorithms that recover the top-$k$ or total ranking using as few queries as possible.

Optimal Sample Complexity of M-wise Data for Top-K Ranking

This work examines an M-wise comparison model that builds on the Plackett-Luce model where for each sample, M items are ranked according to their perceived utilities modeled as noisy observations of their underlying true utilities.

A Nearly Instance Optimal Algorithm for Top-k Ranking under the Multinomial Logit Model

This work designs a new active ranking algorithm without using any information about the underlying items' preference scores, and establishes a matching lower bound on the sample complexity even when the set of preference scores is given to the algorithm.

PAC Subset Selection in Stochastic Multi-armed Bandits

The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given.

The K-armed Dueling Bandits Problem

Competitive analysis of the top-K ranking problem

A linear time algorithm is presented which has a competitive ratio of O( √ n) times as many samples needed as the best possible algorithm for that instance of top-K, and it is shown that this is tight: any algorithm for thetop-K problem has competitive ratio at least Ω(√ n).

On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models

This work introduces generic notions of complexity for the two dominant frameworks considered in the literature: fixed-budget and fixed-confidence settings, and provides the first known distribution-dependent lower bound on the complexity that involves information-theoretic quantities and holds when m ≥ 1 under general assumptions.