• Corpus ID: 239009860

# Gaussian Process Bandit Optimization with Few Batches

@inproceedings{Li2021GaussianPB,
title={Gaussian Process Bandit Optimization with Few Batches},
author={Zihan Li and Jonathan Scarlett},
booktitle={International Conference on Artificial Intelligence and Statistics},
year={2021}
}
• Published in
International Conference on…
15 October 2021
• Computer Science
In this paper, we consider the problem of black-box optimization using Gaussian Process (GP) bandit optimization with a small number of batches. Assuming the unknown function has a low norm in the Reproducing Kernel Hilbert Space (RKHS), we introduce a batch algorithm inspired by batched ﬁnite-arm bandit algorithms, and show that it achieves the cumulative regret upper bound O ∗ ( √ Tγ T ) using O (log log T ) batches within time horizon T , where the O ∗ ( · ) notation hides dimension…

## Figures from this paper

• Computer Science
ICML
• 2022
First, instance-dependent regret lower bounds for algorithms with uniformly (over the function class) vanishing normalized cumulative regret are derived, valid for several practically relevant kernelized bandits algorithms, such as, GP-UCB , GP-TS and SupKernelUCB.
• Computer Science
ArXiv
• 2022
This work proposes a sequential upper conﬁdence bound based algorithm GPN-UCB along with a general theoretical upper bound on the cumulative regret and provides algorithm-independent lower bounds on the simple regret and cumulative regret, showing that GPN -UCB is near-optimal for chains and multi-output chains in broad cases of interest.
• Computer Science
2022 IEEE International Symposium on Information Theory (ISIT)
• 2022
The LP-GP-UCB algorithm is proposed which augments a Gaussian process surrogate model with local polynomial estimators of the function to construct a multi-scale upper confidence bound to guide the search for the optimizer.
• Computer Science
ArXiv
• 2022
This work proposes a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants, and shows that the algorithm is robust against a variety of adversarial attacks.
• Computer Science
ArXiv
• 2022
Two algorithms based on the Thompson sampling (TS) policy named Sample-Then-Optimize Batch Neural TS (STO-BNTS) and STO -BNTS-Linear are introduced and derive regret upper bounds for their algorithms with batch evaluations, and use insights from batch BO and NTK to show that they are asymptotically no-regret under certain conditions.
Several upper bounds on regret are discussed; none of which seem order optimal, and a conjecture on the order optimal regret bound is provided.
• Computer Science
COLT
• 2021
The question of online confidence intervals in the RKHS setting is formalized and the main challenge seems to stem from the online (sequential) nature of the observation points.
• Computer Science
ICML
• 2022
Novel confidence intervals are provided for the Nystr ¨ om method and the sparse variational Gaussian process approximation method, which are established using novel interpretations of the approximate (surrogate) posterior variance of the models.
• Computer Science
ICML
• 2022
Algorithms with sub-linear regret guarantees that address the dilemma of selecting new function queries while waiting for randomly delayed feedback are proposed.
• Computer Science
ArXiv
• 2022
The non-asymptotic error bounds are derived on the difference between an overparameterized neural net and its corresponding neural tangent kernel and an algorithm with a provably sublinear regret bound that is also efficient in the finite regime is proposed.

## References

SHOWING 1-10 OF 30 REFERENCES

• Computer Science
NeurIPS
• 2021
This work proves an Õ( √ γN/N) bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds and is order optimal up to logarithmic factors for the cases where a lower bound on regret is known.
• Computer Science
ICML
• 2010
This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.
• Computer Science
ICML
• 2017
This work provides two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB and GP-Thomson sampling (GP-TS) and derive corresponding regret bounds, and derives a new self-normalized concentration inequality for vector- valued martingales of arbitrary, possibly infinite, dimension.
• Computer Science
COLT
• 2019
BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-Optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP.
• Computer Science
ICML
• 2021
This paper considers the problem of finding a single “good action” according to a known pre-specified threshold, and introduces several good-action identification algorithms that exploit knowledge of the threshold.
• Computer Science
ICML
• 2021
In this paper, we consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm is some Reproducing Kernel Hilbert Space (RKHS), which can
• Computer Science
COLT
• 2017
This paper provides algorithm-independent lower bounds on the simple regret, measuring the suboptimality of a single point reported after $T$ rounds, and on the cumulative regret,asuring the sum of regrets over the $T$ chosen points.
• Computer Science
UAI
• 2013
This work proposes KernelUCB, a kernelised UCB algorithm, and gives a cumulative regret bound through a frequentist analysis and improves the regret bound of GP-UCB for the agnostic case, both in the terms of the kernel-dependent quantity and the RKHS norm of the reward function.
• Computer Science
AISTATS
• 2021
General bounds on $\gamma_T$ are provided based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on $T$ and consequently the regret bounds relying on $gamma-T$ under numerous settings are provided.
• Computer Science
2022 IEEE International Symposium on Information Theory (ISIT)
• 2022
The LP-GP-UCB algorithm is proposed which augments a Gaussian process surrogate model with local polynomial estimators of the function to construct a multi-scale upper confidence bound to guide the search for the optimizer.