• Publications
  • Influence
More Adaptive Algorithms for Adversarial Bandits
TLDR
The main idea of the algorithm is to apply the optimism and adaptivity techniques to the well-known Online Mirror Descent framework with a special log-barrier regularizer to come up with appropriate optimistic predictions and correction terms in this framework. Expand
A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free
TLDR
The first contextual bandit algorithm that is parameter-free, efficient, and optimal in terms of dynamic regret is proposed, and introduces replay phases, in which the algorithm acts according to its previous decisions for a certain amount of time in order to detect non-stationarity. Expand
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
TLDR
This result significantly improves over the $\mathcal{O}(T^{3/4})$ regret achieved by the only existing model-free algorithm by Abbasi-Yadkori et al. (2019a) for ergodic MDPs in the infinite-horizon average-reward setting. Expand
Online Reinforcement Learning in Stochastic Games
TLDR
The UCSG algorithm is proposed that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent, and this result improves previous ones under the same setting. Expand
Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
TLDR
This work develops the first general semi-bandit algorithm that simultaneously achieves regret for stochastic environments and adversarial environments without knowledge of the regime or the number of rounds $T$. Expand
Linear Last-iterate Convergence in Constrained Saddle-point Optimization
TLDR
This work significantly expands the understanding of last-iterate convergence for OGDA and OMWU in the constrained setting and introduces a sufficient condition under which OGDA exhibits concrete last- iterate convergence rates with a constant learning rate, which holds for strongly-convex-strongly-concave functions. Expand
Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information
This joint extended abstract introduces and compares the results of (Auer et al., 2019) and (Chen et al., 2019), both of which resolve the problem of achieving optimal dynamic regret forExpand
Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
TLDR
Several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation with inspiration from adversarial linear bandits are developed. Expand
Taking a hint: How to leverage loss predictors in contextual bandits?
TLDR
A complete answer to the question whether one can improve over the minimax regret over $T$ rounds, when the total error of the predictor $\mathcal{E} \leq T$ is relatively small is provided, including upper and lower bounds for various settings: adversarial versus stochastic environments, known versus unknown, and single versus multiple predictors. Expand
Tracking the Best Expert in Non-stationary Stochastic Environments
TLDR
A new parameter $\Lambda$ is introduced, which measures the total statistical variance of the loss distributions over $T$ rounds of the process, and how this amount affects the regret, and proposes algorithms with upper bound guarantee, and proves their matching lower bounds. Expand
...
1
2
3
...