• Corpus ID: 227053946

# Fully Gap-Dependent Bounds for Multinomial Logit Bandit

@article{Yang2021FullyGB,
title={Fully Gap-Dependent Bounds for Multinomial Logit Bandit},
author={Jiaqi Yang},
journal={ArXiv},
year={2021},
volume={abs/2011.09998}
}
• Jiaqi Yang
• Published 19 November 2020
• Computer Science
• ArXiv
We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most $K$ from a pool of $N$ items, and the buyer purchases an item from the assortment according to a MNL choice model. The objective is to learn the model parameters and maximize the expected revenue. We present (i) an algorithm that identifies the optimal assortment $S^*$ within $\widetilde{O}(\sum_{i = 1}^N \Delta_i^{-2})$ time steps with high probability, and (ii) an…
Instance-Sensitive Algorithms for Pure Exploration in Multinomial Logit Bandit
• Computer Science
ArXiv
• 2020
This paper gives efficient algorithms for pure exploration in MNL-bandit that achieve instance-sensitive pull complexities and complement the upper bounds by an almost matching lower bound.

## References

SHOWING 1-10 OF 36 REFERENCES
Thompson Sampling for the MNL-Bandit
• Computer Science, Mathematics
COLT
• 2017
An approach to adapt Thompson Sampling to this problem is presented and it is shown that it achieves near-optimal regret as well as attractive numerical performance.
MNL-Bandit: A Dynamic Learning Approach to Assortment Selection
• Computer Science
Oper. Res.
• 2019
An efficient algorithm is given that simultaneously explores and exploits, achieving performance independent of the underlying parameters, and is adaptive in the sense that its performance is near-optimal in both the "well separated" case, as well as the general parameter setting where this separation need not hold.
Combinatorial Bandits with Relative Feedback
• Computer Science, Mathematics
NeurIPS
• 2019
We consider combinatorial online learning with subset choices when only relative feedback information from subsets is available, instead of bandit or semi-bandit feedback which is absolute.
A Near-Optimal Exploration-Exploitation Approach for Assortment Selection
• Computer Science, Economics
EC
• 2016
It is shown that by exploiting the specific characteristics of the MNL model it is possible to design an algorithm with Õ(√NT) regret, under a mild assumption, and it is demonstrated that this performance is nearly optimal.
Top-$k$ Combinatorial Bandits with Full-Bandit Feedback
• Computer Science, Mathematics
ALT
• 2020
This work presents the Combinatorial Successive Accepts and Rejects (CSAR) algorithm, which generalizes SAR (Bubeck et al, 2013) for top-k combinatorial bandits, and presents an efficient sampling scheme that uses Hadamard matrices in order to estimate accurately the individual arms' expected rewards.
Dynamic Assortment Optimization with a Multinomial Logit Choice Model and Capacity Constraint
• Computer Science, Mathematics
Oper. Res.
• 2010
This work develops an adaptive policy that learns the unknown parameters from past data and at the same time optimizes the profit and develops a simple algorithm for computing a profit-maximizing assortment based on the geometry of lines in the plane.
• Mathematics, Computer Science
ICML
• 2017
A new hardness parameter for characterizing the difficulty of any given instance is introduced and a lower bound result is proved showing that the extra $\log(\epsilon^{-1})$ is necessary for instance-dependent algorithms using the introduced hardness parameter.
Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models
• Computer Science, Mathematics
NeurIPS
• 2018
This paper shows that a trisection based algorithm achieves an item-independent regret bound of O(sqrt(T log log T), which matches information theoretical lower bounds up to iterated logarithmic terms.
Combinatorial Multi-Armed Bandit with General Reward Functions
• Wei Chen
• Computer Science, Mathematics
NIPS
• 2016
A new algorithm called stochastic combinatorial multi-armed bandit (CMAB) framework is studied, which allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables.
A Nearly Instance Optimal Algorithm for Top-k Ranking under the Multinomial Logit Model
• Computer Science, Mathematics
SODA
• 2018
This work designs a new active ranking algorithm without using any information about the underlying items' preference scores, and establishes a matching lower bound on the sample complexity even when the set of preference scores is given to the algorithm.