Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

@inproceedings{Srinivas2010GaussianPO,
  title={Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design},
  author={Niranjan Srinivas and Andreas Krause and Sham M. Kakade and Matthias W. Seeger},
  booktitle={ICML},
  year={2010}
}
Many applications require optimizing an unknown, noisy function that is expensive to evaluate. [] Key Method We analyze GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds…

Figures from this paper

Optimal Order Simple Regret for Gaussian Process Bandits
TLDR
This work proves an Õ( √ γN/N) bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds and is order optimal up to logarithmic factors for the cases where a lower bound on regret is known.
On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization
In this paper, we consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm is some Reproducing Kernel Hilbert Space (RKHS), which can
Regret Bounds for Gaussian-Process Optimization in Large Domains
TLDR
Upper bounds on the suboptimality (Bayesian simple regret) of the solution found by optimization strategies that are closely related to the widely used expected improvement (EI) and upper confidence bound (UCB) algorithms are provided.
On Information Gain and Regret Bounds in Gaussian Process Bandits
TLDR
General bounds on $\gamma_T$ are provided based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on $T$ and consequently the regret bounds relying on $gamma-T$ under numerous settings are provided.
Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret
TLDR
BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-Optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP.
Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration
TLDR
The Gaussian Process Upper Confidence Bound and Pure exploration algorithm (GP-UCB-PE) is introduced which combines the UCB strategy and Pure Exploration in the same batch of evaluations along the parallel iterations and proves theoretical upper bounds on the regret with batches of size K for this procedure.
Gaussian Process Bandit Optimization with Few Batches
TLDR
A batch algorithm inspired by batched finite-arm bandit algorithms is introduced, and it is shown that it achieves the cumulative regret upper bound O ∗ ( √ Tγ T ) using O (log log T ) batches within time horizon T, where the O∗ ( · ) notation hides dimension-independent logarithmic factors and γ T is the maximum information gain associated with the kernel.
On Kernelized Multi-armed Bandits
TLDR
This work provides two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB and GP-Thomson sampling (GP-TS) and derive corresponding regret bounds, and derives a new self-normalized concentration inequality for vector- valued martingales of arbitrary, possibly infinite, dimension.
Regret Bounds for Noise-Free Bayesian Optimization
TLDR
This paper establishes new and tightest bounds for two algorithms, namely GP-UCB and Thompson sampling, under the assumption that the objective function is smooth in terms of having a bounded norm in a Matern RKHS.
Lenient Regret and Good-Action Identification in Gaussian Process Bandits
TLDR
This paper considers the problem of finding a single “good action” according to a known pre-specified threshold, and introduces several good-action identification algorithms that exploit knowledge of the threshold.
...
...

References

SHOWING 1-10 OF 39 REFERENCES
Regret Bounds for Gaussian Process Bandit Problems
TLDR
The main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function dening the Gaussian process.
Stochastic Linear Optimization under Bandit Feedback
TLDR
A nearly complete characterization of the classical stochastic k-armed bandit problem in terms of both upper and lower bounds for the regret is given, and two variants of an algorithm based on the idea of “upper confidence bounds” are presented.
Online Optimization in X-Armed Bandits
TLDR
The results imply that if Χ is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally Holder with a known exponent, then the expected regret is bounded up to a logarithmic factor by √n.
Linearly Parameterized Bandits
TLDR
It is proved that the regret and Bayes risk is of order Θ(r √T), by establishing a lower bound for an arbitrary policy, and showing that a matching upper bound is obtained through a policy that alternates between exploration and exploitation phases.
The Price of Bandit Information for Online Optimization
TLDR
This paper presents an algorithm which achieves O*(n3/2 √T) regret and presents lower bounds showing that this gap is at least √n, which is conjecture to be the correct order.
Near-optimal Nonmyopic Value of Information in Graphical Models
TLDR
This work addresses the long standing problem of nonmyopically selecting the most informative subset of variables in a graphical model and presents the first efficient randomized algorithm providing a constant factor (1 - 1/e – e) approximation guarantee for any e > 0 with high confidence.
Multi-armed bandits in metric spaces
TLDR
This work defines an isometry invariant Max Min COV(X) which bounds from below the performance of Lipschitz MAB algorithms for X, and presents an algorithm which comes arbitrarily close to meeting this bound.
Finite-time Analysis of the Multiarmed Bandit Problem
TLDR
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
An Exact Algorithm for Maximum Entropy Sampling
TLDR
An upper bound for the entropy is established, based on the eigenvalue interlacing property, and incorporated in a branch-and-bound algorithm for the exact solution of the experimental design problem of selecting a most informative subset, having prespecified size, from a set of correlated random variables.
Information Consistency of Nonparametric Gaussian Process Methods
TLDR
By focussing on the concept of information consistency for Bayesian Gaussian process (GP)models, consistency results and convergence rates are obtained via a regret bound on cumulative log loss.
...
...