• Corpus ID: 221703244

# On Information Gain and Regret Bounds in Gaussian Process Bandits

@article{Vakili2020OnIG,
title={On Information Gain and Regret Bounds in Gaussian Process Bandits},
author={Sattar Vakili and Kia Khezeli and Victor Picheny},
journal={ArXiv},
year={2020},
volume={abs/2009.06966}
}
• Published 15 September 2020
• Computer Science
• ArXiv
Consider the sequential optimization of an expensive to evaluate and possibly non-convex objective function $f$ from noisy feedback, that can be considered as a continuum-armed bandit problem. Upper bounds on the regret performance of several learning algorithms (GP-UCB, GP-TS, and their variants) are known under both a Bayesian (when $f$ is a sample from a Gaussian process (GP)) and a frequentist (when $f$ lives in a reproducing kernel Hilbert space) setting. The regret bounds often rely on…

## Tables from this paper

• Computer Science
ICML
• 2021
This paper considers the problem of finding a single “good action” according to a known pre-specified threshold, and introduces several good-action identification algorithms that exploit knowledge of the threshold.
• Computer Science
ICML
• 2021
It is shown that approximate maximum likelihood learning of model parameters by maximising the lower bound retains many benefits of the sparse variational approach while reducing the bias introduced into hyperparameter learning.
• Computer Science
AISTATS
• 2022
A generic method using a Bayesian approach based on a class of penalty functions is proposed, and it is proved that it can achieve a sublinear regret with respect to the global optimum and a sub linear constraint violation (CV), which can match the best results of previous methods.
• Yuntian Deng
• Computer Science
AISTATS
• 2022
WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression with general weights, is developed, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards.
• Computer Science
ArXiv
• 2023
This work considers a kernel bandit problem under stochastically delayed feedback, and proposes an algorithm with a significant improvement over the state of the art regret bound of $\tilde{\mathcal{O}}(\sqrt{T}+\mathbb{E}[\tau]\Gamma_k(T)$, trivializing the existing results.
• Computer Science
ICML
• 2022
Novel confidence intervals are provided for the Nystr ¨ om method and the sparse variational Gaussian process approximation method, which are established using novel interpretations of the approximate (surrogate) posterior variance of the models.
• Computer Science
NeurIPS
• 2021
This work proves an Õ( √ γN/N) bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds and is order optimal up to logarithmic factors for the cases where a lower bound on regret is known.
• Computer Science
• 2023
This work proposes a theoretical framework for studying reward learning and the associated optimal experiment design problem, and shows that the well-studied problem of Gaussian process (GP) bandit optimization is a special case of this framework, and that its bounds either improve or are competitive with known regret guarantees for the Mat\'ern kernel.
• Computer Science
ArXiv
• 2023
A simple and unified analysis of stochastic partial monitoring is presented, and a single algorithm, information-directed sampling (IDS), is (nearly) worst-case rate optimal in all finite-action games.
• Computer Science
Proc. ACM Meas. Anal. Comput. Syst.
• 2023
A new distributed phase-then-batch-based elimination (DPBE) algorithm, which samples users in phases for collecting feedback to reduce the bias and employs maximum variance reduction to select actions in batches within each phase, which can significantly reduce both communication cost and computation complexity in distributed kernelized bandits.

## References

SHOWING 1-10 OF 46 REFERENCES

• Computer Science
COLT
• 2017
This paper provides algorithm-independent lower bounds on the simple regret, measuring the suboptimality of a single point reported after $T$ rounds, and on the cumulative regret,asuring the sum of regrets over the $T$ chosen points.
A theoretical analysis shows that, under fairly mild technical assumptions on the kernel, the best possible cumulative regret up to time $T$ behaves as $\Omega(\sqrt{T})$ and $O(sqrt{\log T}$ factor, and includes the first non-trivial lower bound for noisy BO.
• Computer Science
ICML
• 2010
This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.
• Computer Science
COLT
• 2019
BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-Optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP.
• Computer Science
ICML
• 2021
In this paper, we consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm is some Reproducing Kernel Hilbert Space (RKHS), which can
• Computer Science
NeurIPS
• 2021
This work proves regret bounds for TS based on approximate GP posteriors, whose application to sparse GPs shows that the improvement in computational complexity can be achieved with no loss in terms of the order of regret performance.
• Computer Science
AISTATS
• 2017
Thompson sampling can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrt{d}$ regret factor compared to a UCB-like approach.
• Computer Science
ICML
• 2017
This work provides two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB and GP-Thomson sampling (GP-TS) and derive corresponding regret bounds, and derives a new self-normalized concentration inequality for vector- valued martingales of arbitrary, possibly infinite, dimension.
• Computer Science, Mathematics
COLT
• 2008
A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.
• Computer Science
ArXiv
• 2017
In this paper, the problem of maximizing a black-box function f:\mathcal{X} \to \mathbb{R}\$ is studied in the Bayesian framework with a Gaussian Process (GP) prior, and high probability bounds on its simple and cumulative regret are established.