• Corpus ID: 239009860

Gaussian Process Bandit Optimization with Few Batches

@inproceedings{Li2021GaussianPB,
  title={Gaussian Process Bandit Optimization with Few Batches},
  author={Zihan Li and Jonathan Scarlett},
  booktitle={International Conference on Artificial Intelligence and Statistics},
  year={2021}
}
  • Zihan LiJ. Scarlett
  • Published in
    International Conference on…
    15 October 2021
  • Computer Science
In this paper, we consider the problem of black-box optimization using Gaussian Process (GP) bandit optimization with a small number of batches. Assuming the unknown function has a low norm in the Reproducing Kernel Hilbert Space (RKHS), we introduce a batch algorithm inspired by batched finite-arm bandit algorithms, and show that it achieves the cumulative regret upper bound O ∗ ( √ Tγ T ) using O (log log T ) batches within time horizon T , where the O ∗ ( · ) notation hides dimension… 

Figures from this paper

Instance-Dependent Regret Analysis of Kernelized Bandits

First, instance-dependent regret lower bounds for algorithms with uniformly (over the function class) vanishing normalized cumulative regret are derived, valid for several practically relevant kernelized bandits algorithms, such as, GP-UCB , GP-TS and SupKernelUCB.

Regret Bounds for Noise-Free Cascaded Kernelized Bandits

This work proposes a sequential upper confidence bound based algorithm GPN-UCB along with a general theoretical upper bound on the cumulative regret and provides algorithm-independent lower bounds on the simple regret and cumulative regret, showing that GPN -UCB is near-optimal for chains and multi-output chains in broad cases of interest.

Multi-Scale Zero-Order Optimization of Smooth Functions in an RKHS

  • S. ShekharT. Javidi
  • Computer Science
    2022 IEEE International Symposium on Information Theory (ISIT)
  • 2022
The LP-GP-UCB algorithm is proposed which augments a Gaussian process surrogate model with local polynomial estimators of the function to construct a multi-scale upper confidence bound to guide the search for the optimizer.

A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits

This work proposes a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants, and shows that the algorithm is robust against a variety of adversarial attacks.

Sample-Then-Optimize Batch Neural Thompson Sampling

Two algorithms based on the Thompson sampling (TS) policy named Sample-Then-Optimize Batch Neural TS (STO-BNTS) and STO -BNTS-Linear are introduced and derive regret upper bounds for their algorithms with batch evaluations, and use insights from batch BO and NTK to show that they are asymptotically no-regret under certain conditions.

Regret Bounds for Noise-Free Kernel-Based Bandits

Several upper bounds on regret are discussed; none of which seem order optimal, and a conjecture on the order optimal regret bound is provided.

Open Problem: Tight Online Confidence Intervals for RKHS Elements

The question of online confidence intervals in the RKHS setting is formalized and the main challenge seems to stem from the online (sequential) nature of the observation points.

Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning

Novel confidence intervals are provided for the Nystr ¨ om method and the sparse variational Gaussian process approximation method, which are established using novel interpretations of the approximate (surrogate) posterior variance of the models.

Bayesian Optimization under Stochastic Delayed Feedback

Algorithms with sub-linear regret guarantees that address the dilemma of selecting new function queries while waiting for randomly delayed feedback are proposed.

Provably and Practically Efficient Neural Contextual Bandits

The non-asymptotic error bounds are derived on the difference between an overparameterized neural net and its corresponding neural tangent kernel and an algorithm with a provably sublinear regret bound that is also efficient in the finite regime is proposed.

References

SHOWING 1-10 OF 30 REFERENCES

Optimal Order Simple Regret for Gaussian Process Bandits

This work proves an Õ( √ γN/N) bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds and is order optimal up to logarithmic factors for the cases where a lower bound on regret is known.

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

This work analyzes GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design and obtaining explicit sublinear regret bounds for many commonly used covariance functions.

On Kernelized Multi-armed Bandits

This work provides two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB and GP-Thomson sampling (GP-TS) and derive corresponding regret bounds, and derives a new self-normalized concentration inequality for vector- valued martingales of arbitrary, possibly infinite, dimension.

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-Optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP.

Lenient Regret and Good-Action Identification in Gaussian Process Bandits

This paper considers the problem of finding a single “good action” according to a known pre-specified threshold, and introduces several good-action identification algorithms that exploit knowledge of the threshold.

On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization

In this paper, we consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm is some Reproducing Kernel Hilbert Space (RKHS), which can

Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

This paper provides algorithm-independent lower bounds on the simple regret, measuring the suboptimality of a single point reported after $T$ rounds, and on the cumulative regret,asuring the sum of regrets over the $T $ chosen points.

Finite-Time Analysis of Kernelised Contextual Bandits

This work proposes KernelUCB, a kernelised UCB algorithm, and gives a cumulative regret bound through a frequentist analysis and improves the regret bound of GP-UCB for the agnostic case, both in the terms of the kernel-dependent quantity and the RKHS norm of the reward function.

On Information Gain and Regret Bounds in Gaussian Process Bandits

General bounds on $\gamma_T$ are provided based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on $T$ and consequently the regret bounds relying on $gamma-T$ under numerous settings are provided.

Multi-Scale Zero-Order Optimization of Smooth Functions in an RKHS

  • S. ShekharT. Javidi
  • Computer Science
    2022 IEEE International Symposium on Information Theory (ISIT)
  • 2022
The LP-GP-UCB algorithm is proposed which augments a Gaussian process surrogate model with local polynomial estimators of the function to construct a multi-scale upper confidence bound to guide the search for the optimizer.