• Corpus ID: 49528229

Contextual bandits with surrogate losses: Margin bounds and efficient algorithms

@inproceedings{Foster2018ContextualBW,
  title={Contextual bandits with surrogate losses: Margin bounds and efficient algorithms},
  author={Dylan J. Foster and Akshay Krishnamurthy},
  booktitle={NeurIPS},
  year={2018}
}
We use surrogate losses to obtain several new regret bounds and new algorithms for contextual bandit learning. Using the ramp loss, we derive a new margin-based regret bound in terms of standard sequential complexity measures of a benchmark class of real-valued regression functions. Using the hinge loss, we derive an efficient algorithm with a $\sqrt{dT}$-type mistake bound against benchmark policies induced by $d$-dimensional regressors. Under realizability assumptions, our results also yield… 

Figures from this paper

Generalized Policy Elimination: an efficient algorithm for Nonparametric Contextual Bandits
TLDR
It is shown that GPE is regret-optimal (up to logarithmic factors) for policy classes with integrable entropy and the core techniques used to analyze GPE can be used to design an $\varepsilon$-greedy algorithm with regret bound matching that of the best algorithms to date.
Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning
TLDR
This work studies a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class and provides first-of-their-kind generalization guarantees and fast convergence rates.
Universal and data-adaptive algorithms for model selection in linear contextual bandits
TLDR
New algorithms are introduced that explore in a dataadaptive manner and provide model selection guarantees of the form O(dT ) with no feature diversity conditions whatsoever, where d denotes the dimension of the linear model and T denotes the total number of rounds.
OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits
TLDR
This work designs a single computationally efficient algorithm that simultaneously obtains problem-dependent optimal regret rates in the simple multi-armed bandit regime and minimax optimal regret rate in the linear contextual bandite regime, without knowing a priori which of the two models generates the rewards.
Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case
We study the problem of efficient online multiclass linear classification with bandit feedback, where all examples belong to one of $K$ classes and lie in the $d$-dimensional Euclidean space.
Differentially Private Nonparametric Regression Under a Growth Condition
TLDR
It is shown that under the relaxed condition lim infη↓0 η · sfatη(H) = 0, H is privately learnable, establishing the first nonparametric private learnability guarantee for classes H with sfat-sequential fat shattering dimension, diverging as η ↓ 0.
Online Pricing with Reserve Price Constraint for Personal Data Markets
TLDR
A contextual dynamic pricing mechanism with the reserve price constraint is proposed, which features the properties of ellipsoid for efficient online optimization, and can support linear and non-linear market value models with uncertainty.
Adaptive Learning: Algorithms and Complexity
TLDR
This thesis proves the equivalence of adaptive algorithms, probabilistic objects called martingale inequalities, and geometric objects called Burkholder functions, and provides a theory of learnability for adaptive online learning, a new margin theory paralleling that of classical statistical learning.
Fast Rates for Nonparametric Online Learning: From Realizability to Learning in Games
TLDR
Several new techniques are introduced, including a hierarchical aggregation rule to achieve the optimal cumulative loss for real-valued classes, a multi-scale extension of the proper online realizable learner of Hanneke et al. (2021), an approach to show that the output of such nonparametric learning algorithms is stable, and a proof that the minimax theorem holds in all online learnable games.
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
TLDR
This work describes the minimax rates for contextual bandits with general, potentially nonparametric function classes, and shows that the first universal and optimal reduction from contextual bandits to online regression is provided, which requires no distributional assumptions beyond realizability.
...
1
2
...

References

SHOWING 1-10 OF 58 REFERENCES
Practical Contextual Bandits with Regression Oracles
TLDR
This work presents a new technique that has the empirical and computational advantages of realizability-based approaches combined with the flexibility of agnostic methods, and typically gives comparable or superior results.
Contextual Bandits with Linear Payoff Functions
TLDR
An O (√ Td ln (KT ln(T )/δ) ) regret bound is proved that holds with probability 1− δ for the simplest known upper confidence bound algorithm for this problem.
On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization
This work characterizes the generalization ability of algorithms whose predictions are linear in the input vector. To this end, we provide sharp bounds for Rademacher and Gaussian complexities of
BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits
TLDR
It is shown that the adversarial version of the contextual bandit problem is learnable (and efficient) whenever the full-information supervised online learning problem has a non-trivial regret guarantee ( and efficient) and the method uses unlabeled data to make the problem computationally simple.
Adaptive Online Learning
TLDR
Modifications to recently introduced sequential complexity measures can be used to answer the question of whether there is some algorithm achieving this bound by providing sufficient conditions under which adaptive rates can be achieved, and a new type of adaptive bound for online linear optimization based on the spectral norm is derived.
Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning
TLDR
This work designs the first explicit algorithm achieving the minimax regret rate (up to log factors) and obtains algorithms for Lipschitz and semi-Lipschitzer losses with regret bounds improving on the known bounds for standard bandit feedback.
Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction
TLDR
It is proved that the regret of NEWTRON is O(log T) when α is a constant that does not vary with horizon T, and at most O(T2/3) if α is allowed to increase to infinity with T.
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K actions in response to the observed context, and observes the reward only for that
Contextual Bandit Learning with Predictable Rewards
TLDR
A new lower bound is proved showing no algorithm can achieve superior performance in the worst case even with the realizability assumption, and it is shown that for any set of policies, there is a distribution over rewards such that the new algorithm has constant regret unlike the previous approaches.
Cost-sensitive Multiclass Classification Risk Bounds
TLDR
A bound is developed for the case of cost-sensitive multiclass classification and a convex surrogate loss that goes back to the work of Lee, Lin and Wahba and is as easy to calculate as in binary classification.
...
1
2
3
4
5
...