• Corpus ID: 229298025

Experts with Lower-Bounded Loss Feedback: A Unifying Framework

  title={Experts with Lower-Bounded Loss Feedback: A Unifying Framework},
  author={Eyal Gofer and Guy Gilboa},
The most prominent feedback models for the best expert problem are the full information and bandit models. In this work we consider a simple feedback model that generalizes both, where on every round, in addition to a bandit feedback, the adversary provides a lower bound on the loss of each expert. Such lower bounds may be obtained in various scenarios, for instance, in stock trading or in assessing errors of certain measurement devices. For this model we prove optimal regret bounds (up to… 


Regret Bounds and Minimax Policies under Partial Monitoring
The stochastic bandit game is considered, and it is proved that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002a) achieves the distribution-free optimal rate while still having a distribution-dependent rate logarithmic in the number of plays.
Higher-Order Regret Bounds with Switching Costs
This work examines online linear optimization with full information and switching costs (SCs) and focuses on regret bounds that depend on properties of the loss sequences, and upper bound the price of "at the money" call options, assuming bounds on the quadratic variation of a stock price and the minimum of summed gains and summed losses.
Efficient learning by implicit exploration in bandit problems with side observations
This work proposes the first algorithm that enjoys near-optimal regret guarantees without having to know the observation system before selecting its actions and defines a new partial information setting that models online combinatorial optimization problems where the feedback received by the learner is between semi-bandit and full feedback.
Better Algorithms for Benign Bandits
A new algorithm is proposed for the bandit linear optimization problem which obtains a regret bound of O (√Q), where Q is the total variation in the cost functions, and shows that it is possible to incur much less regret in a slowly changing environment even in theBandit setting.
From Bandits to Experts: On the Value of Side-Observations
Practical algorithms with provable regret guarantees are developed, which depend on non-trivial graph-theoretic properties of the information feedback structure and partially-matching lower bounds are provided.
Sparsity, variance and curvature in multi-armed bandits
A key new insight is used to use regularizers satisfying more refined conditions than general self-concordance to obtain results related to sparsity, variance and curvature.
Improved second-order bounds for prediction with expert advice
New and sharper regret bounds are derived for the well-known exponentially weighted average forecaster and for a second forecaster with a different multiplicative update rule, expressed in terms of sums of squared payoffs, replacing larger first-order quantities appearing in previous bounds.
Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback
A partial-information model of online learning, where a decision maker repeatedly chooses from a finite set of actions, and observes some subset of the associated losses, is presented and studied.
Bandit Regret Scaling with the Effective Loss Range
We study how the regret guarantees of nonstochastic multi-armed bandits can be improved, if the effective range of the losses in each round is small (e.g. the maximal difference between two losses in
Extracting certainty from uncertainty: regret bounded by variation in costs
The question whether it is be possible to bound the regret of an online algorithm by the variation of the observed costs is resolved, and bounds in the fully adversarial setting are proved, in two important online learning scenarios: prediction from expert advice, and online linear optimization.