• Corpus ID: 211171591

Best-item Learning in Random Utility Models with Subset Choices

  title={Best-item Learning in Random Utility Models with Subset Choices},
  author={Aadirupa Saha and Aditya Gopalan},
  booktitle={International Conference on Artificial Intelligence and Statistics},
We consider the problem of PAC learning the most valuable item from a pool of $n$ items using sequential, adaptively chosen plays of subsets of $k$ items, when, upon playing a subset, the learner receives relative feedback sampled according to a general Random Utility Model (RUM) with independent noise perturbations to the latent item utilities. We identify a new property of such a RUM, termed the minimum advantage, that helps in characterizing the complexity of separating pairs of items based… 

Figures from this paper

Choice Bandits

An algorithm for choice bandits, termed Winner Beats All (WBA), with a distribution dependent O(log T ) regret bound under all these choice models is proposed, which is competitive with previous dueling bandit algorithms and outperforms the recently proposed MaxMinUCB algorithm designed for the MNL model.

Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

A new algorithm is provided that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works.

Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits

These proposed algorithms are provably optimal as justified with matching lower bound guarantees for dynamic regret guarantees for other notions of dueling bandits regret, including condorcet regret, best-response bounds, and Borda regret.

Identification of the Generalized Condorcet Winner in Multi-dueling Bandits

The Dvoretzky–Kiefer–Wolfowitz tournament (DKWT) algorithm is proposed, which proves to be nearly optimal and empirically outperforms current state-of-the-art algorithms, even in the special case of dueling bandits or under a Plackett-Luce assumption on the feedback mechanism.

ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits

An elimination-based rescheduling algorithm is developed and shown to be a near-optimal dynamic regret bound, where S CW is the number of times the Condorcet winner changes in T rounds.

Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits

The Correlated Preference Bandits problem with random utility based choice models (RUMs), where the goal is to identify the best item from a given pool of n items through online subsetwise preference feedback, is introduced and a new class of Block-Rank based RUM model is introduced.



Active Ranking with Subset-wise Preferences

These algorithms rely on a novel {pivot trick} to maintain only $n$ itemwise score estimates, unlike $O(n^2)$ pairwise score estimate estimates that has been used in prior work.

PAC Ranking from Pairwise and Listwise Queries: Lower Bounds and Upper Bounds

This paper derives a lower bound on the sample complexity (aka number of queries), and proposes an algorithm that is sample-complexity-optimal up to an $O(\log(k+l)/\log{k})$ factor and designs ranking algorithms that recover the top-$k$ or total ranking using as few queries as possible.

PAC Battling Bandits in the Plackett-Luce Model

Two algorithms are proposed for the PAC problem with the TR feedback model with optimal (upto logarithmic factors) sample complexity guarantees, establishing the increase in statistical efficiency from exploiting rank-ordered feedback.

A Nearly Instance Optimal Algorithm for Top-k Ranking under the Multinomial Logit Model

This work designs a new active ranking algorithm without using any information about the underlying items' preference scores, and establishes a matching lower bound on the sample complexity even when the set of preference scores is given to the algorithm.

Optimal Sample Complexity of M-wise Data for Top-K Ranking

This work examines an M-wise comparison model that builds on the Plackett-Luce model where for each sample, M items are ranked according to their perceived utilities modeled as noisy observations of their underlying true utilities.

Top-k Selection based on Adaptive Sampling of Noisy Preferences

This work proposes and formally analyze a general preference-based racing algorithm that is instantiate with three specific ranking procedures and corresponding sampling schemes, and assumes that alternatives can be compared in terms of pairwise preferences.

PAC Subset Selection in Stochastic Multi-armed Bandits

The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given.

Competitive analysis of the top-K ranking problem

A linear time algorithm is presented which has a competitive ratio of O( √ n) times as many samples needed as the best possible algorithm for that instance of top-K, and it is shown that this is tight: any algorithm for thetop-K problem has competitive ratio at least Ω(√ n).

Learning Mixtures of Random Utility Models

The problem of identifiability and efficient learning of mixtures of Random Utility Models (RUMs) is tackled, and it is shown that when the PDFs of utility distributions are symmetric, the mixture of k RUMs is not identifiable when the number of alternatives m is no more than 2k-1, but when m ≥ max{4k-2,6}, any k-RUM is generically identifiable.

Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows

This work addresses the problem of rank elicitation assuming that the underlying data generating process is characterized by a probability distribution on the set of all rankings of a given set of items, and allows the learner to query pairwise preferences.