Best-item Learning in Random Utility Models with Subset Choices
@inproceedings{Saha2020BestitemLI, title={Best-item Learning in Random Utility Models with Subset Choices}, author={Aadirupa Saha and Aditya Gopalan}, booktitle={International Conference on Artificial Intelligence and Statistics}, year={2020} }
We consider the problem of PAC learning the most valuable item from a pool of $n$ items using sequential, adaptively chosen plays of subsets of $k$ items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities. We identify a new property of such a RUM, termed the minimum advantage, that helps in characterizing the complexity of separating pairs of items based…
Figures from this paper
6 Citations
Choice Bandits
- Computer ScienceNeurIPS
- 2020
An algorithm for choice bandits, termed Winner Beats All (WBA), with a distribution dependent O(log T ) regret bound under all these choice models is proposed, which is competitive with previous dueling bandit algorithms and outperforms the recently proposed MaxMinUCB algorithm designed for the MNL model.
Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability
- Computer ScienceALT
- 2022
A new algorithm is provided that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works.
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
- Computer ScienceICML
- 2022
These proposed algorithms are provably optimal as justified with matching lower bound guarantees for dynamic regret guarantees for other notions of dueling bandits regret, including condorcet regret, best-response bounds, and Borda regret.
Identification of the Generalized Condorcet Winner in Multi-dueling Bandits
- Computer ScienceNeurIPS
- 2021
The Dvoretzky–Kiefer–Wolfowitz tournament (DKWT) algorithm is proposed, which proves to be nearly optimal and empirically outperforms current state-of-the-art algorithms, even in the special case of dueling bandits or under a Plackett-Luce assumption on the feedback mechanism.
ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
- Computer ScienceArXiv
- 2022
An elimination-based rescheduling algorithm is developed and shown to be a near-optimal dynamic regret bound, where S CW is the number of times the Condorcet winner changes in T rounds.
Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits
- Computer ScienceAISTATS
- 2022
The Correlated Preference Bandits problem with random utility based choice models (RUMs), where the goal is to identify the best item from a given pool of n items through online subsetwise preference feedback, is introduced and a new class of Block-Rank based RUM model is introduced.
References
SHOWING 1-10 OF 45 REFERENCES
Active Ranking with Subset-wise Preferences
- Computer ScienceAISTATS
- 2019
These algorithms rely on a novel {pivot trick} to maintain only $n$ itemwise score estimates, unlike $O(n^2)$ pairwise score estimate estimates that has been used in prior work.
PAC Ranking from Pairwise and Listwise Queries: Lower Bounds and Upper Bounds
- Computer ScienceArXiv
- 2018
This paper derives a lower bound on the sample complexity (aka number of queries), and proposes an algorithm that is sample-complexity-optimal up to an $O(\log(k+l)/\log{k})$ factor and designs ranking algorithms that recover the top-$k$ or total ranking using as few queries as possible.
PAC Battling Bandits in the Plackett-Luce Model
- Computer ScienceALT
- 2019
Two algorithms are proposed for the PAC problem with the TR feedback model with optimal (upto logarithmic factors) sample complexity guarantees, establishing the increase in statistical efficiency from exploiting rank-ordered feedback.
A Nearly Instance Optimal Algorithm for Top-k Ranking under the Multinomial Logit Model
- Computer ScienceSODA
- 2018
This work designs a new active ranking algorithm without using any information about the underlying items' preference scores, and establishes a matching lower bound on the sample complexity even when the set of preference scores is given to the algorithm.
Optimal Sample Complexity of M-wise Data for Top-K Ranking
- Computer ScienceNIPS
- 2017
This work examines an M-wise comparison model that builds on the Plackett-Luce model where for each sample, M items are ranked according to their perceived utilities modeled as noisy observations of their underlying true utilities.
Top-k Selection based on Adaptive Sampling of Noisy Preferences
- Computer ScienceICML
- 2013
This work proposes and formally analyze a general preference-based racing algorithm that is instantiate with three specific ranking procedures and corresponding sampling schemes, and assumes that alternatives can be compared in terms of pairwise preferences.
PAC Subset Selection in Stochastic Multi-armed Bandits
- Computer ScienceICML
- 2012
The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given.
Competitive analysis of the top-K ranking problem
- Computer ScienceSODA
- 2017
A linear time algorithm is presented which has a competitive ratio of O( √ n) times as many samples needed as the best possible algorithm for that instance of top-K, and it is shown that this is tight: any algorithm for thetop-K problem has competitive ratio at least Ω(√ n).
Learning Mixtures of Random Utility Models
- Computer ScienceAAAI
- 2018
The problem of identifiability and efficient learning of mixtures of Random Utility Models (RUMs) is tackled, and it is shown that when the PDFs of utility distributions are symmetric, the mixture of k RUMs is not identifiable when the number of alternatives m is no more than 2k-1, but when m ≥ max{4k-2,6}, any k-RUM is generically identifiable.
Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows
- Computer ScienceICML
- 2014
This work addresses the problem of rank elicitation assuming that the underlying data generating process is characterized by a probability distribution on the set of all rankings of a given set of items, and allows the learner to query pairwise preferences.