Corpus ID: 220496455

Relaxing the I.I.D. Assumption: Adaptive Minimax Optimal Sequential Prediction with Expert Advice

@article{Bilodeau2020RelaxingTI,
  title={Relaxing the I.I.D. Assumption: Adaptive Minimax Optimal Sequential Prediction with Expert Advice},
  author={Blair Bilodeau and Jeffrey Negrea and Daniel M. Roy},
  journal={ArXiv},
  year={2020},
  volume={abs/2007.06552}
}
We consider sequential prediction with expert advice when the data are generated stochastically, but the distributions generating the data may vary arbitrarily among some constraint set. We quantify relaxations of the classical I.I.D. assumption in terms of possible constraint sets, with I.I.D. at one extreme, and an adversarial mechanism at the other. The Hedge algorithm, long known to be minimax optimal in the adversarial regime, has recently been shown to also be minimax optimal in the I.I.D… Expand

References

SHOWING 1-10 OF 38 REFERENCES
Online Learning: Stochastic, Constrained, and Smoothed Adversaries
TLDR
This work defines the minimax value of a game where the adversary is restricted in his moves, capturing stochastic and non-stochastic assumptions on data and defines a notion of distribution-dependent Rademacher complexity for the spectrum of problems ranging from i.i.d. to worst-case. Expand
Adaptation to Easy Data in Prediction with Limited Advice
TLDR
An online learning algorithm with improved regret guarantees for ``easy'' loss sequences and in the stochastic setting SODA achieves an pseudo-regret bound that holds simultaneously with the adversarial regret guarantee. Expand
Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning
TLDR
This work considers online learning algorithms that guarantee worst-case regret rates in adversarial environments, yet adapt optimally to favorable stochastic environments (so they will perform well in a variety of settings of practical importance), and quantifies the friendliness of stoChastic environments by means of the well-known Bernstein condition. Expand
On the optimality of the Hedge algorithm in the stochastic regime
TLDR
It is proved that anytime Hedge with decreasing learning rate, which is one of the simplest algorithm for the problem of prediction with expert advice, is remarkably both worst-case optimal and adaptive to the easier stochastic and adversarial with a gap problems. Expand
An Optimal Algorithm for Stochastic and Adversarial Bandits
TLDR
The proposed algorithm enjoys improved regret guarantees in two intermediate regimes: the moderately contaminated stochastic regime defined by Seldin and Slivkins (2014) and the stochastically constrained adversary studied by Wei and Luo (2018). Expand
The Best of Both Worlds: Stochastic and Adversarial Bandits
TLDR
SAO (Stochastic and Adversarial Optimal) combines the O( √ n) worst-case regret of Exp3 and the (poly)logarithmic regret of UCB1 for stochastic rewards for adversarial rewards. Expand
Minimax Policies for Adversarial and Stochastic Bandits.
TLDR
This work fills in a long open gap in the characterization of the minimax rate for the multi-armed bandit prob- lem and proposes a new family of randomized algorithms based on an implicit normalization, as well as a new analysis. Expand
Prediction with Expert Advice by Following the Perturbed Leader for General Weights
TLDR
The analysis of the alternative “Follow the Perturbed Leader” (FPL) algorithm from [KV03] (based on Hannan’s algorithm) is easier, and loss bounds for adaptive learning rate and both finite Expert classes with uniform weights and countable expert classes with arbitrary weights are derived. Expand
One Practical Algorithm for Both Stochastic and Adversarial Bandits
TLDR
The algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm, and retains "logarithmic" regret guarantee in the stochastic regime even when some observations are contaminated by an adversary. Expand
A second-order bound with excess losses
TLDR
Online aggregation of the predictions of experts is studied, and new second-order regret bounds in the standard setting are obtained via a version of the Prod algorithm with multiple learning rates and two versions of the polynomially weighted average algorithm. Expand
...
1
2
3
4
...