• Corpus ID: 3136373

Towards Minimax Online Learning with Unknown Time Horizon

@inproceedings{Luo2014TowardsMO,
  title={Towards Minimax Online Learning with Unknown Time Horizon},
  author={Haipeng Luo and Robert E. Schapire},
  booktitle={ICML},
  year={2014}
}
We consider online learning when the time horizon is unknown. We apply a minimax analysis, beginning with the fixed horizon case, and then moving on to two unknown-horizon settings, one that assumes the horizon is chosen randomly according to some distribution, and the other which allows the adversary full control over the horizon. For the random horizon setting with restricted losses, we derive a fully optimal minimax algorithm. And for the adversarial horizon setting, we prove a nontrivial… 

Figures from this paper

Optimal anytime regret for two experts
TLDR
This work designs the first minimax optimal algorithm for minimizing regret in the anytime setting, and considers the case of two experts, and proves that the optimal regret $\gamma\sqrt{t}/2$ is at all time steps $t$.
Achievability of asymptotic minimax regret by horizon-dependent and horizon-independent strategies
TLDR
It is demonstrated that an easily implementable Bayes mixture based on a conjugate Dirichlet prior with a simple dependency on n achieves asymptotic minimaxity for all sequences, simplifying earlier similar proposals.
Near Minimax Optimal Players for the Finite-Time 3-Expert Prediction Problem
TLDR
The regret of the game scaling is characterized as √8/(9π)T ± log(T)2 which gives for the first time the optimal constant in the leading (√T) term of the regret.
Regret-optimal Strategies for Playing Repeated Games with Discounted Losses
TLDR
This paper presents a novel set-valued dynamic programming approach for designing such exact regret-optimal policies for playing repeated games with discounted losses, and describes the set of minimal achievable guarantees as the fixed point of a dynamic programming operator defined on the space of Pareto frontiers of convex and compact sets.
Towards Optimal Algorithms for Prediction with Expert Advice
TLDR
The proof shows that the probability matching algorithm is not only optimal against this particular randomized adversary, but also minimax optimal, and provides a general framework for designing the optimal algorithm and adversary for an arbitrary number of experts.
Regret Minimization in Repeated Games: A Set-Valued Dynamic Programming Approach
TLDR
This paper presents a novel set-valued dynamic programming approach for characterizing regret-optimal policies in repeated games with discounted losses and finite action sets, and proposes a procedure based on approximate value iteration to compute $\epsilon$-regret- optimal strategies for any $\ep silon>0$, for the case where the decision maker has only two available actions.
An Approximate Dynamic Programming Approach to Adversarial Online Learning
TLDR
An approximate dynamic programming (ADP) approach to compute approximations of the optimal strategies and of the minimal losses that can be guaranteed in discounted repeated games with vector-valued losses that suggests the significant potential of ADP-based approaches in adversarial online learning.
Tight Lower Bounds for the Multiplicative Weights Algorithm on the Experts Problem
TLDR
Tight lower bounds for the regret achievable by the widely used Multiplicative Weights Algorithm (MWA) are developed and it is shown that the structure of the optimal adversary for the finite and geometric horizon models are mirror-images of each other in a strong sense.
An Approximate Dynamic Programming Approach to Repeated Games with Vector Losses
TLDR
Numerical evaluations demonstrate the sub-optimality of well-known off-the-shelf online learning algorithms like Hedge and a significantly improved performance on using approximately optimal strategies in these settings.
Tight Lower Bounds for Multiplicative Weights Algorithmic Families
TLDR
This work develops simple adversarial primitives, that lend themselves to various combinations leading to sharp lower bounds for many algorithmic families, and uses these primitives to show that the classic Multiplicative Weights Algorithm (MWA) has a regret of (T*ln(k)/2)^{0.5}.
...
...

References

SHOWING 1-10 OF 33 REFERENCES
A Parameter-free Hedging Algorithm
TLDR
A new notion of regret is introduced, which is more natural for applications with a large number of actions and achieves performance close to that of the best bounds achieved by previous algorithms with optimally-tuned parameters.
Follow the leader if you can, hedge if you must
TLDR
The FlipFlop algorithm is introduced, which is the first method that provably combines the best of both worlds and AdaHedge, a new way of dynamically tuning the learning rate in Hedge without using the doubling trick, and improved worst-case guarantees.
Adaptive Algorithms for Online Decision Problems
TLDR
An algorithm for the tree update problem that is statically optimal for every sufficiently long contiguous subsequence of accesses is given, which combines techniques from data streaming algorithms, composition of learning algorithms, and a twist on the standard experts framework.
Optimal Stragies and Minimax Lower Bounds for Online Convex Games
TLDR
This work analyzes Online Convex Game settings from a minimax perspective, proving minimax strategies and lower bounds in each case and proving that the existing algorithms are essentially optimal.
Regret bounds for prediction problems
TLDR
A unified framework for reasoning about worst-case regret bounds for learning algorithms is presented, based on the theory of duality of convex functions, bringing together results from computational learning theory and Bayesian statistics, to derive new proofs of known theorems, new theorem about known algorithms, and new algorithms.
Lower bounds on individual sequence regret
TLDR
This work lower bound the individual sequence anytime regret of a large family of online algorithms, and shows that the analysis can be generalized to accommodate diverse measures of variation beside quadratic variation.
Minimax Optimal Algorithms for Unconstrained Linear Optimization
TLDR
This work designs and analyzes minimax-optimal algorithms for online linear optimization games where the player's choice is unconstrained, and gives a thorough analysis of the minimax behavior of the game, providing characterizations for the value and the adversary's optimal strategy.
Adaptive and Self-Confident On-Line Learning Algorithms
TLDR
This paper shows that essentially the same optimized bounds can be obtained when the algorithms adaptively tune their learning rates as the examples in the sequence are progressively revealed, as they depend on the whole sequence of examples.
Prediction by random-walk perturbation
We propose a version of the follow-the-perturbed-leader online prediction algorithm in which the cumulative losses are perturbed by independent symmetric random walks. The forecaster is shown to
The robustness of the p-norm algorithms
TLDR
A family of on-line algorithms, called p-norm algorithms, introduced by Grove, Littlestone and Schuurmans in the context of deterministic binary classification are studied, showing how to adapt these algorithms for use in the regression setting, and proving worst-case bounds on the square loss.
...
...