# Jointly Efficient and Optimal Algorithms for Logistic Bandits

@inproceedings{Faury2022JointlyEA, title={Jointly Efficient and Optimal Algorithms for Logistic Bandits}, author={Louis Faury and Marc Abeille and Kwang-Sung Jun and Cl{\'e}ment Calauz{\`e}nes}, booktitle={AISTATS}, year={2022} }

Logistic Bandits have recently undergone careful scrutiny by virtue of their combined theoretical and practical relevance. This research eﬀort delivered statistically eﬃcient algorithms, improving the regret of previous strategies by exponentially large factors. Such algorithms are however strikingly costly as they require Ω( t ) operations at each round. On the other hand, a diﬀerent line of research focused on computational eﬃciency ( O (1) per-round cost), but at the cost of letting go of…

## One Citation

### Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits

- Computer ScienceArXiv
- 2022

The information-theoretic perspective of Russo and Van Roy [2016] is adapted to the contextual setting by introducing a new concept of information ratio based on the mutual information between the unknown model parameter and the observed loss that allows the regret to be bound in terms of the entropy of the prior distribution.

## References

SHOWING 1-10 OF 19 REFERENCES

### Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits

- Computer ScienceAISTATS
- 2021

A novel algorithm is introduced for which the permanent regime non-linearity can dramatically ease the exploration-exploitation trade-off and it is proved that this rate is minimax-optimal by deriving a $\Omega(d\sqrt{T/\kappa})$ problem-dependent lower-bound.

### Thompson Sampling for Contextual Bandits with Linear Payoffs

- Computer ScienceICML
- 2013

A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed.

### Improved Optimistic Algorithms for Logistic Bandits

- Computer ScienceICML
- 2020

A new optimistic algorithm is proposed based on a finer examination of the non-linearities of the reward function that enjoys a $\tilde{\mathcal{O}}(\sqrt{T})$ regret with no dependency in $\kappa$, but for a second order term.

### Improved Confidence Bounds for the Linear Logistic Model and Applications to Bandits

- Computer ScienceICML
- 2021

Improved fixed-design confidence bounds for the linear logistic model improve upon the state-of-the-art bound by Li et al. (2017) and provide a lower bound highlighting a dependence on 1/κ for a family of instances.

### Improved Algorithms for Linear Stochastic Bandits

- Computer ScienceNIPS
- 2011

A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.

### Online Stochastic Linear Optimization under One-bit Feedback

- Computer ScienceICML
- 2016

This paper develops an efficient online learning algorithm by exploiting particular structures of the observation model to minimize the regret defined by the unknown linear function in a special bandit setting of online stochastic linear optimization.

### Linear Thompson Sampling Revisited

- Computer ScienceAISTATS
- 2017

Thompson sampling can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrt{d}$ regret factor compared to a UCB-like approach.

### Scalable Generalized Linear Bandits: Online Computation and Hashing

- Computer ScienceNIPS
- 2017

A novel Generalized Linear extension of the Online-to-confidence-set Conversion (GLOC method) that takes \emph{any} online learning algorithm and turns it into a GLB algorithm, which results in a low-regretGLB algorithm with much lower time and memory complexity than prior work.

### An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling

- Computer ScienceAISTATS
- 2021

The proposed SGD-TS algorithm, which uses a single-step SGD update to exploit past information and uses Thompson Sampling for exploration, achieves regret with the total time complexity that scales linearly in T and d, where T is the total number of rounds and d is the number of features.

### Efficient improper learning for online logistic regression

- Computer ScienceCOLT
- 2020

An efficient improper algorithm is designed that avoids an exponential multiplicative constant while preserving a logarithmic regret and satisfies a regret scaling as O(B log(Bn) with a per-round time-complexity of order O(d^2).