• Corpus ID: 231698562

Optimistic and Adaptive Lagrangian Hedging

  title={Optimistic and Adaptive Lagrangian Hedging},
  author={Ryan D'Orazio and Ruitong Huang},
In online learning an algorithm plays against an environment with losses possibly picked by an adversary at each round. The generality of this framework includes problems that are not adversarial, for example offline optimization, or saddle point problems (i.e. min max optimization). However, online algorithms are typically not designed to leverage additional structure present in non-adversarial problems. Recently, slight modifications to well-known online algorithms such as optimism and… 
Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games
This work introduces the extensive-form regret minimization (EFR) algorithm, and identifies behavioral deviation subsets, the partial sequence deviation types, that subsume previously studied types and lead to efficient EFR instances in games with moderate lengths.
Effcient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games
Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly


A Modern Introduction to Online Learning
This monograph introduces the basic concepts of Online Learning through a modern view of Online Convex Optimization, and presents first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings.
Bounds for Regret-Matching Algorithms
A general class of learning algorithms, regret-matching algorithms, and a regret-based framework for analyzing their performance in online decision problems are introduced, based on a set Φ of transformations over the set of actions.
Online Optimization with Gradual Variations
It is shown that for the linear and general smooth convex loss functions, an online algorithm modified from the gradient descend algorithm can achieve a regret which only scales as the square root of the deviation, and as an application, this can also have such a logarithmic regret for the portfolio management problem.
No-regret Algorithms for Online Convex Programs
Lagrangian Hedging algorithms are derived based on a general class of potential functions, and are a direct generalization of known learning rules like weighted majority and external-regret matching, which prove regret bounds and demonstrate their algorithms learning to play one-card poker.
Optimization, Learning, and Games with Predictable Sequences
It is proved that a version of Optimistic Mirror Descent can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T)/T).
Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions
It is shown that when the goal is minimizing regret, rather than computing a Nash equilibrium, the optimistic methods can outperform CFR+, even in deep game trees, and this decomposition mirrors the structure of the counterfactual regret minimization framework.
Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization
It is shown that OMWU monotonically improves the Kullback-Leibler divergence of the current iterate to the (appropriately normalized) min-max solution until it enters a neighborhood of the solution and becomes a contracting map converging to the exact solution.
Regret Minimization with Function Approximation in Extensive-Form Games
Theoretical results for CFR are extended when using function approximation, and worst-case guarantees with function approximation are complemented with experiments on several common benchmark games with sequential decision making and imperfect information.
Faster Game Solving via Predictive Blackwell Approachability: Connecting Regret Matching and Mirror Descent
P predictive RM+ coupled with counterfactual regret minimization converges vastly faster than the fastest prior algorithms (CFR+, DCFR, LCFR) across all games but two of the poker games and Liar's Dice, sometimes by two or more orders of magnitude.
A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds
This paper provides a self-contained, modular analysis of the two workhorses of online learning: (general) adaptive versions of Mirror Descent and the Follow-the-Regularized-Leader algorithms, and presents algorithms with improved variational bounds for smooth, composite objectives.