# On Information Gain and Regret Bounds in Gaussian Process Bandits

@article{Vakili2020OnIG, title={On Information Gain and Regret Bounds in Gaussian Process Bandits}, author={Sattar Vakili and Kia Khezeli and Victor Picheny}, journal={ArXiv}, year={2020}, volume={abs/2009.06966} }

Consider the sequential optimization of an expensive to evaluate and possibly non-convex objective function $f$ from noisy feedback, that can be considered as a continuum-armed bandit problem. Upper bounds on the regret performance of several learning algorithms (GP-UCB, GP-TS, and their variants) are known under both a Bayesian (when $f$ is a sample from a Gaussian process (GP)) and a frequentist (when $f$ lives in a reproducing kernel Hilbert space) setting. The regret bounds often rely on…

## 54 Citations

### Lenient Regret and Good-Action Identification in Gaussian Process Bandits

- Computer ScienceICML
- 2021

This paper considers the problem of finding a single “good action” according to a known pre-specified threshold, and introduces several good-action identification algorithms that exploit knowledge of the threshold.

### Tighter Bounds on the Log Marginal Likelihood of Gaussian Process Regression Using Conjugate Gradients

- Computer ScienceICML
- 2021

It is shown that approximate maximum likelihood learning of model parameters by maximising the lower bound retains many benefits of the sparse variational approach while reducing the bias introduced into hyperparameter learning.

### A Bayesian Approach for Stochastic Continuum-armed Bandit with Long-term Constraints

- Computer ScienceAISTATS
- 2022

A generic method using a Bayesian approach based on a class of penalty functions is proposed, and it is proved that it can achieve a sublinear regret with respect to the global optimum and a sub linear constraint violation (CV), which can match the best results of previous methods.

### Weighted Gaussian Process Bandits for Non-stationary Environments

- Computer ScienceAISTATS
- 2022

WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression with general weights, is developed, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards.

### Delayed Feedback in Kernel Bandits

- Computer ScienceArXiv
- 2023

This work considers a kernel bandit problem under stochastically delayed feedback, and proposes an algorithm with a significant improvement over the state of the art regret bound of $\tilde{\mathcal{O}}(\sqrt{T}+\mathbb{E}[\tau]\Gamma_k(T)$, trivializing the existing results.

### Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning

- Computer ScienceICML
- 2022

Novel confidence intervals are provided for the Nystr ¨ om method and the sparse variational Gaussian process approximation method, which are established using novel interpretations of the approximate (surrogate) posterior variance of the models.

### Optimal Order Simple Regret for Gaussian Process Bandits

- Computer ScienceNeurIPS
- 2021

This work proves an Õ( √ γN/N) bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds and is order optimal up to logarithmic factors for the cases where a lower bound on regret is known.

### Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws

- Computer Science
- 2023

This work proposes a theoretical framework for studying reward learning and the associated optimal experiment design problem, and shows that the well-studied problem of Gaussian process (GP) bandit optimization is a special case of this framework, and that its bounds either improve or are competitive with known regret guarantees for the Mat\'ern kernel.

### Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications

- Computer ScienceArXiv
- 2023

A simple and unified analysis of stochastic partial monitoring is presented, and a single algorithm, information-directed sampling (IDS), is (nearly) worst-case rate optimal in all finite-action games.

### (Private) Kernelized Bandits with Distributed Biased Feedback

- Computer ScienceProc. ACM Meas. Anal. Comput. Syst.
- 2023

A new distributed phase-then-batch-based elimination (DPBE) algorithm, which samples users in phases for collecting feedback to reduce the bias and employs maximum variance reduction to select actions in batches within each phase, which can significantly reduce both communication cost and computation complexity in distributed kernelized bandits.

## References

SHOWING 1-10 OF 46 REFERENCES

### Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

- Computer ScienceCOLT
- 2017

This paper provides algorithm-independent lower bounds on the simple regret, measuring the suboptimality of a single point reported after $T$ rounds, and on the cumulative regret,asuring the sum of regrets over the $T $ chosen points.

### Tight Regret Bounds for Bayesian Optimization in One Dimension

- Computer ScienceICML
- 2018

A theoretical analysis shows that, under fairly mild technical assumptions on the kernel, the best possible cumulative regret up to time $T$ behaves as $\Omega(\sqrt{T})$ and $O(sqrt{\log T}$ factor, and includes the first non-trivial lower bound for noisy BO.

### Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

- Computer ScienceICML
- 2010

This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.

### Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

- Computer ScienceCOLT
- 2019

BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-Optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP.

### On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization

- Computer ScienceICML
- 2021

In this paper, we consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm is some Reproducing Kernel Hilbert Space (RKHS), which can…

### Scalable Thompson Sampling using Sparse Gaussian Process Models

- Computer ScienceNeurIPS
- 2021

This work proves regret bounds for TS based on approximate GP posteriors, whose application to sparse GPs shows that the improvement in computational complexity can be achieved with no loss in terms of the order of regret performance.

### Linear Thompson Sampling Revisited

- Computer ScienceAISTATS
- 2017

Thompson sampling can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrt{d}$ regret factor compared to a UCB-like approach.

### On Kernelized Multi-armed Bandits

- Computer ScienceICML
- 2017

This work provides two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB and GP-Thomson sampling (GP-TS) and derive corresponding regret bounds, and derives a new self-normalized concentration inequality for vector- valued martingales of arbitrary, possibly infinite, dimension.

### Stochastic Linear Optimization under Bandit Feedback

- Computer Science, MathematicsCOLT
- 2008

A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.

### Gaussian Process bandits with adaptive discretization

- Computer ScienceArXiv
- 2017

In this paper, the problem of maximizing a black-box function f:\mathcal{X} \to \mathbb{R}$ is studied in the Bayesian framework with a Gaussian Process (GP) prior, and high probability bounds on its simple and cumulative regret are established.