# Fully Gap-Dependent Bounds for Multinomial Logit Bandit

@article{Yang2021FullyGB, title={Fully Gap-Dependent Bounds for Multinomial Logit Bandit}, author={Jiaqi Yang}, journal={ArXiv}, year={2021}, volume={abs/2011.09998} }

We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most $K$ from a pool of $N$ items, and the buyer purchases an item from the assortment according to a MNL choice model. The objective is to learn the model parameters and maximize the expected revenue. We present (i) an algorithm that identifies the optimal assortment $S^*$ within $\widetilde{O}(\sum_{i = 1}^N \Delta_i^{-2})$ time steps with high probability, and (ii) an…

## One Citation

Instance-Sensitive Algorithms for Pure Exploration in Multinomial Logit Bandit

- Computer ScienceArXiv
- 2020

This paper gives efficient algorithms for pure exploration in MNL-bandit that achieve instance-sensitive pull complexities and complement the upper bounds by an almost matching lower bound.

## References

SHOWING 1-10 OF 36 REFERENCES

Thompson Sampling for the MNL-Bandit

- Computer Science, MathematicsCOLT
- 2017

An approach to adapt Thompson Sampling to this problem is presented and it is shown that it achieves near-optimal regret as well as attractive numerical performance.

MNL-Bandit: A Dynamic Learning Approach to Assortment Selection

- Computer ScienceOper. Res.
- 2019

An efficient algorithm is given that simultaneously explores and exploits, achieving performance independent of the underlying parameters, and is adaptive in the sense that its performance is near-optimal in both the "well separated" case, as well as the general parameter setting where this separation need not hold.

Combinatorial Bandits with Relative Feedback

- Computer Science, MathematicsNeurIPS
- 2019

We consider combinatorial online learning with subset choices when only relative feedback information from subsets is available, instead of bandit or semi-bandit feedback which is absolute.…

A Near-Optimal Exploration-Exploitation Approach for Assortment Selection

- Computer Science, EconomicsEC
- 2016

It is shown that by exploiting the specific characteristics of the MNL model it is possible to design an algorithm with Õ(√NT) regret, under a mild assumption, and it is demonstrated that this performance is nearly optimal.

Top-$k$ Combinatorial Bandits with Full-Bandit Feedback

- Computer Science, MathematicsALT
- 2020

This work presents the Combinatorial Successive Accepts and Rejects (CSAR) algorithm, which generalizes SAR (Bubeck et al, 2013) for top-k combinatorial bandits, and presents an efficient sampling scheme that uses Hadamard matrices in order to estimate accurately the individual arms' expected rewards.

Dynamic Assortment Optimization with a Multinomial Logit Choice Model and Capacity Constraint

- Computer Science, MathematicsOper. Res.
- 2010

This work develops an adaptive policy that learns the unknown parameters from past data and at the same time optimizes the profit and develops a simple algorithm for computing a profit-maximizing assortment based on the geometry of lines in the plane.

Adaptive Multiple-Arm Identification

- Mathematics, Computer ScienceICML
- 2017

A new hardness parameter for characterizing the difficulty of any given instance is introduced and a lower bound result is proved showing that the extra $\log(\epsilon^{-1})$ is necessary for instance-dependent algorithms using the introduced hardness parameter.

Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models

- Computer Science, MathematicsNeurIPS
- 2018

This paper shows that a trisection based algorithm achieves an item-independent regret bound of O(sqrt(T log log T), which matches information theoretical lower bounds up to iterated logarithmic terms.

Combinatorial Multi-Armed Bandit with General Reward Functions

- Computer Science, MathematicsNIPS
- 2016

A new algorithm called stochastic combinatorial multi-armed bandit (CMAB) framework is studied, which allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables.

A Nearly Instance Optimal Algorithm for Top-k Ranking under the Multinomial Logit Model

- Computer Science, MathematicsSODA
- 2018

This work designs a new active ranking algorithm without using any information about the underlying items' preference scores, and establishes a matching lower bound on the sample complexity even when the set of preference scores is given to the algorithm.