Instance-Dependent Regret Analysis of Kernelized Bandits

@inproceedings{Shekhar2022InstanceDependentRA,
  title={Instance-Dependent Regret Analysis of Kernelized Bandits},
  author={Shubhanshu Shekhar and Tara Javidi},
  booktitle={International Conference on Machine Learning},
  year={2022}
}
We study the problem of designing an adaptive strategy for querying a noisy zeroth-order-oracle to efficiently learn about the optimizer of an unknown function f . To make the problem tractable, we assume that f lies in the reproducing kernel Hilbert space (RKHS) associated with a known kernel K , with its norm bounded by M < ∞ . Prior results, working in a minimax framework , have characterized the worst-case (over all functions in the problem class) limits on regret achievable by any algorithm… 

Figures from this paper

References

SHOWING 1-10 OF 34 REFERENCES

Finite-Time Analysis of Kernelised Contextual Bandits

This work proposes KernelUCB, a kernelised UCB algorithm, and gives a cumulative regret bound through a frequentist analysis and improves the regret bound of GP-UCB for the agnostic case, both in the terms of the kernel-dependent quantity and the RKHS norm of the reward function.

Instance-Dependent Bounds for Zeroth-order Lipschitz Optimization with Error Certificates

The definition of r-covering number of a subset E of R implied by (Wainwright, 2019, Definition 5.1) is slightly stronger than the one used in this paper, because elements x1, . . . , xN of r -covers belong to E rather than just R.

On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization

In this paper, we consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm is some Reproducing Kernel Hilbert Space (RKHS), which can

On Information Gain and Regret Bounds in Gaussian Process Bandits

General bounds on $\gamma_T$ are provided based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on $T$ and consequently the regret bounds relying on $gamma-T$ under numerous settings are provided.

Multi-Scale Zero-Order Optimization of Smooth Functions in an RKHS

  • S. ShekharT. Javidi
  • Computer Science
    2022 IEEE International Symposium on Information Theory (ISIT)
  • 2022
The LP-GP-UCB algorithm is proposed which augments a Gaussian process surrogate model with local polynomial estimators of the function to construct a multi-scale upper confidence bound to guide the search for the optimizer.

X-Armed Bandits

We consider a generalization of stochastic bandits where the set of arms, X, is allowed to be a generic measurable space and the mean-payoff function is "locally Lipschitz" with respect to a

On Kernelized Multi-armed Bandits

This work provides two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB and GP-Thomson sampling (GP-TS) and derive corresponding regret bounds, and derives a new self-normalized concentration inequality for vector- valued martingales of arbitrary, possibly infinite, dimension.

Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

This paper provides algorithm-independent lower bounds on the simple regret, measuring the suboptimality of a single point reported after $T$ rounds, and on the cumulative regret,asuring the sum of regrets over the $T $ chosen points.

Open Problem: Tight Online Confidence Intervals for RKHS Elements

The question of online confidence intervals in the RKHS setting is formalized and the main challenge seems to stem from the online (sequential) nature of the observation points.

Smooth Bandit Optimization: Generalization to Hölder Space

It is demonstrated that the proposed algorithm can exploit higher-order smoothness of the function by deriving a regret upper bound of $\tilde{O}(T^\frac{d+\alpha}{d+2\alpha})$ for when $\alpha>1$, which matches existing lower bound.