• Corpus ID: 245769736

# Jointly Efficient and Optimal Algorithms for Logistic Bandits

@inproceedings{Faury2022JointlyEA,
title={Jointly Efficient and Optimal Algorithms for Logistic Bandits},
author={Louis Faury and Marc Abeille and Kwang-Sung Jun and Cl{\'e}ment Calauz{\e}nes},
booktitle={AISTATS},
year={2022}
}`
• Published in AISTATS 6 January 2022
• Computer Science
Logistic Bandits have recently undergone careful scrutiny by virtue of their combined theoretical and practical relevance. This research eﬀort delivered statistically eﬃcient algorithms, improving the regret of previous strategies by exponentially large factors. Such algorithms are however strikingly costly as they require Ω( t ) operations at each round. On the other hand, a diﬀerent line of research focused on computational eﬃciency ( O (1) per-round cost), but at the cost of letting go of…
1 Citations

## Figures and Tables from this paper

### Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits

• Computer Science
ArXiv
• 2022
The information-theoretic perspective of Russo and Van Roy [2016] is adapted to the contextual setting by introducing a new concept of information ratio based on the mutual information between the unknown model parameter and the observed loss that allows the regret to be bound in terms of the entropy of the prior distribution.

## References

SHOWING 1-10 OF 19 REFERENCES

### Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits

• Computer Science
AISTATS
• 2021
A novel algorithm is introduced for which the permanent regime non-linearity can dramatically ease the exploration-exploitation trade-off and it is proved that this rate is minimax-optimal by deriving a $\Omega(d\sqrt{T/\kappa})$ problem-dependent lower-bound.

### Thompson Sampling for Contextual Bandits with Linear Payoffs

• Computer Science
ICML
• 2013
A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed.

### Improved Optimistic Algorithms for Logistic Bandits

• Computer Science
ICML
• 2020
A new optimistic algorithm is proposed based on a finer examination of the non-linearities of the reward function that enjoys a $\tilde{\mathcal{O}}(\sqrt{T})$ regret with no dependency in $\kappa$, but for a second order term.

### Improved Confidence Bounds for the Linear Logistic Model and Applications to Bandits

• Computer Science
ICML
• 2021
Improved fixed-design confidence bounds for the linear logistic model improve upon the state-of-the-art bound by Li et al. (2017) and provide a lower bound highlighting a dependence on 1/κ for a family of instances.

### Improved Algorithms for Linear Stochastic Bandits

• Computer Science
NIPS
• 2011
A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.

### Online Stochastic Linear Optimization under One-bit Feedback

• Computer Science
ICML
• 2016
This paper develops an efficient online learning algorithm by exploiting particular structures of the observation model to minimize the regret defined by the unknown linear function in a special bandit setting of online stochastic linear optimization.

### Linear Thompson Sampling Revisited

• Computer Science
AISTATS
• 2017
Thompson sampling can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrt{d}$ regret factor compared to a UCB-like approach.

### Scalable Generalized Linear Bandits: Online Computation and Hashing

• Computer Science
NIPS
• 2017
A novel Generalized Linear extension of the Online-to-confidence-set Conversion (GLOC method) that takes \emph{any} online learning algorithm and turns it into a GLB algorithm, which results in a low-regretGLB algorithm with much lower time and memory complexity than prior work.

### An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling

• Computer Science
AISTATS
• 2021
The proposed SGD-TS algorithm, which uses a single-step SGD update to exploit past information and uses Thompson Sampling for exploration, achieves regret with the total time complexity that scales linearly in T and d, where T is the total number of rounds and d is the number of features.

### Efficient improper learning for online logistic regression

• Computer Science
COLT
• 2020
An efficient improper algorithm is designed that avoids an exponential multiplicative constant while preserving a logarithmic regret and satisfies a regret scaling as O(B log(Bn) with a per-round time-complexity of order O(d^2).