• Publications
  • Influence
Off-Policy Actor-Critic
TLDR
This paper derives an incremental, linear time and space complexity algorithm that includes eligibility traces, proves convergence under assumptions similar to previous off-policy algorithms, and empirically show better or comparable performance to existing algorithms on standard reinforcement-learning benchmark problems.
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
TLDR
It is shown that varying the emphasis of linear TD(γ)'s updates in a particular way causes its expected update to become stable under off-policy training.
Maxmin Q-learning: Controlling the Estimation Bias of Q-learning
TLDR
This paper proposes a generalization of Q-learning, called Maxmin Q- learning, which provides a parameter to flexibly control bias, and empirically verify that the algorithm better controls estimation bias in toy environments, and that it achieves superior performance on several benchmark problems.
Meta-Learning Representations for Continual Learning
TLDR
It is shown that it is possible to learn naturally sparse representations that are more effective for online updating and it is demonstrated that a basic online updating strategy on representations learned by OML is competitive with rehearsal based methods for continual learning.
Estimating the class prior and posterior from noisy positives and unlabeled data
TLDR
This work develops a classification algorithm for estimating posterior distributions from positive-unlabeled data, that is robust to noise in the positive labels and effective for high-dimensional data and proves that these univariate transforms preserve the class prior.
An Off-policy Policy Gradient Theorem Using Emphatic Weightings
TLDR
This work develops a new actor-critic algorithm called Actor Critic with Emphatic weightings (ACE) that approximates the simplified gradients provided by the theorem, and demonstrates in a simple counterexample that previous off-policy policy gradient methods Converge to the wrong solution whereas ACE finds the optimal solution.
Convex Multi-view Subspace Learning
TLDR
This paper develops an efficient algorithm that recovers an optimal data reconstruction by exploiting an implicit convex regularizer, then recovers the corresponding latent representation and reconstruction model, jointly and optimally.
Nonparametric semi-supervised learning of class proportions
TLDR
This work study nonparametric class prior estimation and formulate this problem as an estimation of mixing proportions in two-component mixture models, given a sample from one of the components and another sample from the mixture itself, and proposes an algorithm for estimating them.
Unifying Task Specification in Reinforcement Learning
TLDR
This work introduces the RL task formalism, that provides a unification through simple constructs including a generalization to transition-based discounting and extends standard learning constructs, including Bellman operators, and extends some seminal theoretical results, including approximation errors bounds.
Linear Off-Policy Actor-Critic
TLDR
This paper derives an incremental, linear time and space complexity algorithm that includes eligibility traces, proves convergence under assumptions similar to previous o↵-policy algorithms, and empirically show better or comparable performance to existing algorithms on standard reinforcement-learning benchmark problems.
...
...