# Instance-Dependent Regret Analysis of Kernelized Bandits

@inproceedings{Shekhar2022InstanceDependentRA, title={Instance-Dependent Regret Analysis of Kernelized Bandits}, author={Shubhanshu Shekhar and Tara Javidi}, booktitle={International Conference on Machine Learning}, year={2022} }

We study the problem of designing an adaptive strategy for querying a noisy zeroth-order-oracle to efﬁciently learn about the optimizer of an unknown function f . To make the problem tractable, we assume that f lies in the reproducing kernel Hilbert space (RKHS) associated with a known kernel K , with its norm bounded by M < ∞ . Prior results, working in a minimax framework , have characterized the worst-case (over all functions in the problem class) limits on regret achievable by any algorithm…

## References

SHOWING 1-10 OF 34 REFERENCES

### Finite-Time Analysis of Kernelised Contextual Bandits

- Computer ScienceUAI
- 2013

This work proposes KernelUCB, a kernelised UCB algorithm, and gives a cumulative regret bound through a frequentist analysis and improves the regret bound of GP-UCB for the agnostic case, both in the terms of the kernel-dependent quantity and the RKHS norm of the reward function.

### Instance-Dependent Bounds for Zeroth-order Lipschitz Optimization with Error Certificates

- MathematicsNeurIPS
- 2021

The definition of r-covering number of a subset E of R implied by (Wainwright, 2019, Definition 5.1) is slightly stronger than the one used in this paper, because elements x1, . . . , xN of r -covers belong to E rather than just R.

### On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization

- Computer ScienceICML
- 2021

In this paper, we consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm is some Reproducing Kernel Hilbert Space (RKHS), which can…

### On Information Gain and Regret Bounds in Gaussian Process Bandits

- Computer ScienceAISTATS
- 2021

General bounds on $\gamma_T$ are provided based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on $T$ and consequently the regret bounds relying on $gamma-T$ under numerous settings are provided.

### Multi-Scale Zero-Order Optimization of Smooth Functions in an RKHS

- Computer Science2022 IEEE International Symposium on Information Theory (ISIT)
- 2022

The LP-GP-UCB algorithm is proposed which augments a Gaussian process surrogate model with local polynomial estimators of the function to construct a multi-scale upper confidence bound to guide the search for the optimizer.

### X-Armed Bandits

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2011

We consider a generalization of stochastic bandits where the set of arms, X, is allowed to be a generic measurable space and the mean-payoff function is "locally Lipschitz" with respect to a…

### On Kernelized Multi-armed Bandits

- Computer ScienceICML
- 2017

This work provides two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB and GP-Thomson sampling (GP-TS) and derive corresponding regret bounds, and derives a new self-normalized concentration inequality for vector- valued martingales of arbitrary, possibly infinite, dimension.

### Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

- Computer ScienceCOLT
- 2017

This paper provides algorithm-independent lower bounds on the simple regret, measuring the suboptimality of a single point reported after $T$ rounds, and on the cumulative regret,asuring the sum of regrets over the $T $ chosen points.

### Open Problem: Tight Online Confidence Intervals for RKHS Elements

- Computer ScienceCOLT
- 2021

The question of online confidence intervals in the RKHS setting is formalized and the main challenge seems to stem from the online (sequential) nature of the observation points.

### Smooth Bandit Optimization: Generalization to Hölder Space

- Computer Science, MathematicsAISTATS
- 2021

It is demonstrated that the proposed algorithm can exploit higher-order smoothness of the function by deriving a regret upper bound of $\tilde{O}(T^\frac{d+\alpha}{d+2\alpha})$ for when $\alpha>1$, which matches existing lower bound.