• Corpus ID: 245769736

Jointly Efficient and Optimal Algorithms for Logistic Bandits

@inproceedings{Faury2022JointlyEA,
  title={Jointly Efficient and Optimal Algorithms for Logistic Bandits},
  author={Louis Faury and Marc Abeille and Kwang-Sung Jun and Cl{\'e}ment Calauz{\`e}nes},
  booktitle={AISTATS},
  year={2022}
}
Logistic Bandits have recently undergone careful scrutiny by virtue of their combined theoretical and practical relevance. This research effort delivered statistically efficient algorithms, improving the regret of previous strategies by exponentially large factors. Such algorithms are however strikingly costly as they require Ω( t ) operations at each round. On the other hand, a different line of research focused on computational efficiency ( O (1) per-round cost), but at the cost of letting go of… 

Figures and Tables from this paper

Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits

The information-theoretic perspective of Russo and Van Roy [2016] is adapted to the contextual setting by introducing a new concept of information ratio based on the mutual information between the unknown model parameter and the observed loss that allows the regret to be bound in terms of the entropy of the prior distribution.

References

SHOWING 1-10 OF 19 REFERENCES

Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits

A novel algorithm is introduced for which the permanent regime non-linearity can dramatically ease the exploration-exploitation trade-off and it is proved that this rate is minimax-optimal by deriving a $\Omega(d\sqrt{T/\kappa})$ problem-dependent lower-bound.

Thompson Sampling for Contextual Bandits with Linear Payoffs

A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed.

Improved Optimistic Algorithms for Logistic Bandits

A new optimistic algorithm is proposed based on a finer examination of the non-linearities of the reward function that enjoys a $\tilde{\mathcal{O}}(\sqrt{T})$ regret with no dependency in $\kappa$, but for a second order term.

Improved Confidence Bounds for the Linear Logistic Model and Applications to Bandits

Improved fixed-design confidence bounds for the linear logistic model improve upon the state-of-the-art bound by Li et al. (2017) and provide a lower bound highlighting a dependence on 1/κ for a family of instances.

Improved Algorithms for Linear Stochastic Bandits

A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.

Online Stochastic Linear Optimization under One-bit Feedback

This paper develops an efficient online learning algorithm by exploiting particular structures of the observation model to minimize the regret defined by the unknown linear function in a special bandit setting of online stochastic linear optimization.

Linear Thompson Sampling Revisited

Thompson sampling can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrt{d}$ regret factor compared to a UCB-like approach.

Scalable Generalized Linear Bandits: Online Computation and Hashing

A novel Generalized Linear extension of the Online-to-confidence-set Conversion (GLOC method) that takes \emph{any} online learning algorithm and turns it into a GLB algorithm, which results in a low-regretGLB algorithm with much lower time and memory complexity than prior work.

An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling

The proposed SGD-TS algorithm, which uses a single-step SGD update to exploit past information and uses Thompson Sampling for exploration, achieves regret with the total time complexity that scales linearly in T and d, where T is the total number of rounds and d is the number of features.

Efficient improper learning for online logistic regression

An efficient improper algorithm is designed that avoids an exponential multiplicative constant while preserving a logarithmic regret and satisfies a regret scaling as O(B log(Bn) with a per-round time-complexity of order O(d^2).