• Publications
  • Influence
Policy Gradient Methods for Reinforcement Learning with Function Approximation
This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy. Expand
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
A framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability) is described and a model-based and model-free variants of the elimination method are provided. Expand
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
The bandit problem is revisited and considered under the PAC model, and it is shown that given n arms, it suffices to pull the arms O(n/?2 log 1/?) times to find an ?-optimal arm with probability of at least 1 - ?. Expand
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes
This paper presents a new algorithm that, given only a generative model (a natural and common type of simulator) for an arbitrary MDP, performs on-line, near-optimal planning with a per-state running time that has no dependence on the number of states. Expand
Domain Adaptation: Learning Bounds and Algorithms
A novel distance between distributions, discrepancy distance, is introduced that is tailored to adaptation problems with arbitrary loss functions, and Rademacher complexity bounds are given for estimating the discrepancy distance from finite samples for different loss functions. Expand
Constant depth circuits, Fourier transform, and learnability
In this paper, Boolean functions in ,4C0 are studied using harmonic analysis on the cube. The main result is that an ACO Boolean function has almost all of its "power spectrum" on the low-orderExpand
Learning Rates for Q-learning
This paper derives convergence rates for Q-learning from a polynomial learning rate and shows that for a linear learning rate, one which is 1/t at time t, the convergence rate has an exponential dependence on 1/(1-γ). Expand
Nash Convergence of Gradient Dynamics in General-Sum Games
This work analyzes the behavior of agents that incrementally adapt their strategy through gradient ascent on expected payoff, in the simple setting of two-player, two-action, iterated general-sum games, and shows that either the agents will converge to a Nash equilibrium, or if the strategies themselves do not converge, then their average payoffs will nevertheless converge to the payoffs of a Nashilibrium. Expand
Domain Adaptation with Multiple Sources
It is proved that standard convex combinations of the source hypotheses may in fact perform very poorly and that, instead, combinations weighted by the source distributions benefit from favorable theoretical guarantees. Expand
Learning decision trees using the Fourier spectrum
The authors demonstrate that any functionf whose L -norm is polynomial can be approximated by a polynomially sparse function, and prove that boolean decision trees with linear operations are a subset of this class of functions. Expand