# Gaussian Process Bandit Optimization with Few Batches

@inproceedings{Li2021GaussianPB, title={Gaussian Process Bandit Optimization with Few Batches}, author={Zihan Li and Jonathan Scarlett}, booktitle={International Conference on Artificial Intelligence and Statistics}, year={2021} }

In this paper, we consider the problem of black-box optimization using Gaussian Process (GP) bandit optimization with a small number of batches. Assuming the unknown function has a low norm in the Reproducing Kernel Hilbert Space (RKHS), we introduce a batch algorithm inspired by batched ﬁnite-arm bandit algorithms, and show that it achieves the cumulative regret upper bound O ∗ ( √ Tγ T ) using O (log log T ) batches within time horizon T , where the O ∗ ( · ) notation hides dimension…

## 14 Citations

### Instance-Dependent Regret Analysis of Kernelized Bandits

- Computer ScienceICML
- 2022

First, instance-dependent regret lower bounds for algorithms with uniformly (over the function class) vanishing normalized cumulative regret are derived, valid for several practically relevant kernelized bandits algorithms, such as, GP-UCB , GP-TS and SupKernelUCB.

### Regret Bounds for Noise-Free Cascaded Kernelized Bandits

- Computer ScienceArXiv
- 2022

This work proposes a sequential upper conﬁdence bound based algorithm GPN-UCB along with a general theoretical upper bound on the cumulative regret and provides algorithm-independent lower bounds on the simple regret and cumulative regret, showing that GPN -UCB is near-optimal for chains and multi-output chains in broad cases of interest.

### Multi-Scale Zero-Order Optimization of Smooth Functions in an RKHS

- Computer Science2022 IEEE International Symposium on Information Theory (ISIT)
- 2022

The LP-GP-UCB algorithm is proposed which augments a Gaussian process surrogate model with local polynomial estimators of the function to construct a multi-scale upper confidence bound to guide the search for the optimizer.

### A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits

- Computer ScienceArXiv
- 2022

This work proposes a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants, and shows that the algorithm is robust against a variety of adversarial attacks.

### Sample-Then-Optimize Batch Neural Thompson Sampling

- Computer ScienceArXiv
- 2022

Two algorithms based on the Thompson sampling (TS) policy named Sample-Then-Optimize Batch Neural TS (STO-BNTS) and STO -BNTS-Linear are introduced and derive regret upper bounds for their algorithms with batch evaluations, and use insights from batch BO and NTK to show that they are asymptotically no-regret under certain conditions.

### Regret Bounds for Noise-Free Kernel-Based Bandits

- Computer Science
- 2020

Several upper bounds on regret are discussed; none of which seem order optimal, and a conjecture on the order optimal regret bound is provided.

### Open Problem: Tight Online Confidence Intervals for RKHS Elements

- Computer ScienceCOLT
- 2021

The question of online confidence intervals in the RKHS setting is formalized and the main challenge seems to stem from the online (sequential) nature of the observation points.

### Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning

- Computer ScienceICML
- 2022

Novel confidence intervals are provided for the Nystr ¨ om method and the sparse variational Gaussian process approximation method, which are established using novel interpretations of the approximate (surrogate) posterior variance of the models.

### Bayesian Optimization under Stochastic Delayed Feedback

- Computer ScienceICML
- 2022

Algorithms with sub-linear regret guarantees that address the dilemma of selecting new function queries while waiting for randomly delayed feedback are proposed.

### Provably and Practically Efficient Neural Contextual Bandits

- Computer ScienceArXiv
- 2022

The non-asymptotic error bounds are derived on the difference between an overparameterized neural net and its corresponding neural tangent kernel and an algorithm with a provably sublinear regret bound that is also efficient in the finite regime is proposed.

## References

SHOWING 1-10 OF 30 REFERENCES

### Optimal Order Simple Regret for Gaussian Process Bandits

- Computer ScienceNeurIPS
- 2021

This work proves an Õ( √ γN/N) bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds and is order optimal up to logarithmic factors for the cases where a lower bound on regret is known.

### Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

- Computer ScienceICML
- 2010

This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.

### On Kernelized Multi-armed Bandits

- Computer ScienceICML
- 2017

This work provides two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB and GP-Thomson sampling (GP-TS) and derive corresponding regret bounds, and derives a new self-normalized concentration inequality for vector- valued martingales of arbitrary, possibly infinite, dimension.

### Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

- Computer ScienceCOLT
- 2019

BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-Optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP.

### Lenient Regret and Good-Action Identification in Gaussian Process Bandits

- Computer ScienceICML
- 2021

This paper considers the problem of finding a single “good action” according to a known pre-specified threshold, and introduces several good-action identification algorithms that exploit knowledge of the threshold.

### On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization

- Computer ScienceICML
- 2021

In this paper, we consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm is some Reproducing Kernel Hilbert Space (RKHS), which can…

### Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

- Computer ScienceCOLT
- 2017

This paper provides algorithm-independent lower bounds on the simple regret, measuring the suboptimality of a single point reported after $T$ rounds, and on the cumulative regret,asuring the sum of regrets over the $T $ chosen points.

### Finite-Time Analysis of Kernelised Contextual Bandits

- Computer ScienceUAI
- 2013

This work proposes KernelUCB, a kernelised UCB algorithm, and gives a cumulative regret bound through a frequentist analysis and improves the regret bound of GP-UCB for the agnostic case, both in the terms of the kernel-dependent quantity and the RKHS norm of the reward function.

### On Information Gain and Regret Bounds in Gaussian Process Bandits

- Computer ScienceAISTATS
- 2021

General bounds on $\gamma_T$ are provided based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on $T$ and consequently the regret bounds relying on $gamma-T$ under numerous settings are provided.

### Multi-Scale Zero-Order Optimization of Smooth Functions in an RKHS

- Computer Science2022 IEEE International Symposium on Information Theory (ISIT)
- 2022

The LP-GP-UCB algorithm is proposed which augments a Gaussian process surrogate model with local polynomial estimators of the function to construct a multi-scale upper confidence bound to guide the search for the optimizer.