Corpus ID: 4760582

Top-k Selection based on Adaptive Sampling of Noisy Preferences

@inproceedings{BusaFekete2013TopkSB,
  title={Top-k Selection based on Adaptive Sampling of Noisy Preferences},
  author={R. Busa-Fekete and Bal{\'a}zs Sz{\"o}r{\'e}nyi and Weiwei Cheng and Paul Weng and E. H{\"u}llermeier},
  booktitle={ICML},
  year={2013}
}
We consider the problem of reliably selecting an optimal subset of fixed size from a given set of choice alternatives, based on noisy information about the quality of these alternatives. Problems of similar kind have been tackled by means of adaptive sampling schemes called racing algorithms. However, in contrast to existing approaches, we do not assume that each alternative is characterized by a real-valued random variable, and that samples are taken from the corresponding distributions… Expand
PAC Rank Elicitation through Adaptive Sampling of Stochastic Pairwise Preferences
TLDR
This work introduces the problem of PAC rank elicitation, which consists of sorting a given set of options based on adaptive sampling of stochastic pairwise preferences, and instantiates this setting with combinations of two different distance measures and ranking procedures. Expand
Top-κ selection with pairwise comparisons
TLDR
This work adapts two well known Bayesian sequential sampling techniques, the Knowledge Gradient policy and the Optimal Computing Budget Allocation framework for the pairwise setting and demonstrates that these methods are able to match or outperform the current state of the art racing algorithm approach. Expand
Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows
TLDR
This work addresses the problem of rank elicitation assuming that the underlying data generating process is characterized by a probability distribution on the set of all rankings of a given set of items, and allows the learner to query pairwise preferences. Expand
Best-item Learning in Random Utility Models with Subset Choices
TLDR
Fundamental lower bounds on PAC sample complexity show that the learning algorithm given, based on pairwise relative counts of items and hierarchical elimination, is near-optimal in terms of its dependence on $n,k$ and $c$. Expand
Exploiting Transitivity for Top-k Selection with Score-Based Dueling Bandits
TLDR
This work considers the problem of top-k subset selection in Dueling Bandit problems with score information by proposing a Thurstonian style model and adapting the Pairwise Optimal Computing Budget Allocation for subset selection (POCBAm) sampling method to exploit this model for efficient sample selection. Expand
Active Ranking from Pairwise Comparisons and the Futility of Parametric Assumptions
TLDR
By means of tight lower bounds, it is proved that perhaps surprisingly, these popular parametric modeling choices offer little statistical advantage in the context of active ranking from pairwise comparisons. Expand
A Survey of Preference-Based Online Learning with Bandit Algorithms
TLDR
The aim of this paper is to provide a survey of the state-of-the-art in this field, that it refers to as preference-based multi-armed bandits, and to provide an overview of problems that have been considered in the literature as well as methods for tackling them. Expand
Active Ranking from Pairwise Comparisons and when Parametric Assumptions Don't Help
We consider sequential or active ranking of a set of n items based on noisy pairwise comparisons. Items are ranked according to the probability that a given item beats a randomly chosen item, andExpand
Active Ranking with Subset-wise Preferences
TLDR
These algorithms rely on a novel {pivot trick} to maintain only $n$ itemwise score estimates, unlike $O(n^2)$ pairwise score estimate estimates that has been used in prior work. Expand
Batched Coarse Ranking in Multi-Armed Bandits
TLDR
This work proposes algorithms and proves impossibility results which together give almost tight tradeoffs between the total number of arms pulls and the number of policy changes in multi-armed bandits (MAB). Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 30 REFERENCES
The K-armed Dueling Bandits Problem
TLDR
A novel regret formulation is proposed in this setting, as well as an algorithm that achieves information-theoretically optimal regret bounds (up to a constant factor) is presented. Expand
PAC Subset Selection in Stochastic Multi-armed Bandits
TLDR
The expected sample complexity bound for LUCB is novel even for single-arm selection, and a lower bound on the worst case sample complexity of PAC algorithms for Explore-m is given. Expand
Iterative ranking from pair-wise comparisons
TLDR
This paper proposes a novel iterative rank aggregation algorithm for discovering scores for objects from pairwise comparisons which performs as well as the Maximum Likelihood Estimator of the BTL model and outperforms a recently proposed algorithm by Ammar and Shah. Expand
Preference Learning
TLDR
This article aims at conveying a first idea of typical preference learning problems, namely learning from label preferences and learning from object preferences. Expand
Tuning Bandit Algorithms in Stochastic Environments
TLDR
A variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms is considered and for the first time the concentration of the regret is analyzed. Expand
On the Axiomatic Foundations of Ranking Systems
TLDR
This paper considers two fundamental axioms of ranking systems: Transitivity, and Ranked Independence of Irrelevant Alternatives, and finds that there is no general social ranking rule that satisfies both requirements. Expand
Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search
TLDR
An adaptive uncertainty handling based on Hoeffding and empirical Bernstein races is added to the CMA-ES, a variable metric evolution strategy proposed for direct policy search. Expand
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
TLDR
The bandit problem is revisited and considered under the PAC model, and it is shown that given n arms, it suffices to pull the arms O(n/?2 log 1/?) times to find an ?-optimal arm with probability of at least 1 - ?. Expand
Preference Learning
TLDR
The editors first offer a thorough introduction, including a systematic categorization according to learning task and learning technique, along with a unified notation, and the first half of the book is organized into parts on applications of preference learning in multiattribute domains, information retrieval, and recommender systems. Expand
Finite-time Analysis of the Multiarmed Bandit Problem
TLDR
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support. Expand
...
1
2
3
...