• Corpus ID: 221703244

On Information Gain and Regret Bounds in Gaussian Process Bandits

  title={On Information Gain and Regret Bounds in Gaussian Process Bandits},
  author={Sattar Vakili and Kia Khezeli and Victor Picheny},
Consider the sequential optimization of an expensive to evaluate and possibly non-convex objective function $f$ from noisy feedback, that can be considered as a continuum-armed bandit problem. Upper bounds on the regret performance of several learning algorithms (GP-UCB, GP-TS, and their variants) are known under both a Bayesian (when $f$ is a sample from a Gaussian process (GP)) and a frequentist (when $f$ lives in a reproducing kernel Hilbert space) setting. The regret bounds often rely on… 

Tables from this paper

Lenient Regret and Good-Action Identification in Gaussian Process Bandits

This paper considers the problem of finding a single “good action” according to a known pre-specified threshold, and introduces several good-action identification algorithms that exploit knowledge of the threshold.

Tighter Bounds on the Log Marginal Likelihood of Gaussian Process Regression Using Conjugate Gradients

It is shown that approximate maximum likelihood learning of model parameters by maximising the lower bound retains many benefits of the sparse variational approach while reducing the bias introduced into hyperparameter learning.

A Bayesian Approach for Stochastic Continuum-armed Bandit with Long-term Constraints

A generic method using a Bayesian approach based on a class of penalty functions is proposed, and it is proved that it can achieve a sublinear regret with respect to the global optimum and a sub linear constraint violation (CV), which can match the best results of previous methods.

Weighted Gaussian Process Bandits for Non-stationary Environments

WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression with general weights, is developed, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards.

Delayed Feedback in Kernel Bandits

This work considers a kernel bandit problem under stochastically delayed feedback, and proposes an algorithm with a significant improvement over the state of the art regret bound of $\tilde{\mathcal{O}}(\sqrt{T}+\mathbb{E}[\tau]\Gamma_k(T)$, trivializing the existing results.

Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning

Novel confidence intervals are provided for the Nystr ¨ om method and the sparse variational Gaussian process approximation method, which are established using novel interpretations of the approximate (surrogate) posterior variance of the models.

Optimal Order Simple Regret for Gaussian Process Bandits

This work proves an Õ( √ γN/N) bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds and is order optimal up to logarithmic factors for the cases where a lower bound on regret is known.

Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws

This work proposes a theoretical framework for studying reward learning and the associated optimal experiment design problem, and shows that the well-studied problem of Gaussian process (GP) bandit optimization is a special case of this framework, and that its bounds either improve or are competitive with known regret guarantees for the Mat\'ern kernel.

Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications

A simple and unified analysis of stochastic partial monitoring is presented, and a single algorithm, information-directed sampling (IDS), is (nearly) worst-case rate optimal in all finite-action games.

(Private) Kernelized Bandits with Distributed Biased Feedback

A new distributed phase-then-batch-based elimination (DPBE) algorithm, which samples users in phases for collecting feedback to reduce the bias and employs maximum variance reduction to select actions in batches within each phase, which can significantly reduce both communication cost and computation complexity in distributed kernelized bandits.



Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

This paper provides algorithm-independent lower bounds on the simple regret, measuring the suboptimality of a single point reported after $T$ rounds, and on the cumulative regret,asuring the sum of regrets over the $T $ chosen points.

Tight Regret Bounds for Bayesian Optimization in One Dimension

A theoretical analysis shows that, under fairly mild technical assumptions on the kernel, the best possible cumulative regret up to time $T$ behaves as $\Omega(\sqrt{T})$ and $O(sqrt{\log T}$ factor, and includes the first non-trivial lower bound for noisy BO.

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-Optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP.

On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization

In this paper, we consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm is some Reproducing Kernel Hilbert Space (RKHS), which can

Scalable Thompson Sampling using Sparse Gaussian Process Models

This work proves regret bounds for TS based on approximate GP posteriors, whose application to sparse GPs shows that the improvement in computational complexity can be achieved with no loss in terms of the order of regret performance.

Linear Thompson Sampling Revisited

Thompson sampling can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrt{d}$ regret factor compared to a UCB-like approach.

On Kernelized Multi-armed Bandits

This work provides two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB and GP-Thomson sampling (GP-TS) and derive corresponding regret bounds, and derives a new self-normalized concentration inequality for vector- valued martingales of arbitrary, possibly infinite, dimension.

Stochastic Linear Optimization under Bandit Feedback

A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.

Gaussian Process bandits with adaptive discretization

In this paper, the problem of maximizing a black-box function f:\mathcal{X} \to \mathbb{R}$ is studied in the Bayesian framework with a Gaussian Process (GP) prior, and high probability bounds on its simple and cumulative regret are established.