• Publications
  • Influence
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
TLDR
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented. Expand
Constrained Policy Optimization
TLDR
Constrained Policy Optimization (CPO) is proposed, the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration, and allows us to train neural network policies for high-dimensional control while making guarantees about policy behavior all throughout training. Expand
Value Iteration Networks
TLDR
This work introduces the value iteration network (VIN), a fully differentiable neural network with a `planning module' embedded within that shows that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains. Expand
Model-Ensemble Trust-Region Policy Optimization
TLDR
This paper analyzes the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and shows that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training. Expand
Bayesian Reinforcement Learning: A Survey
TLDR
An in-depth review of the role of Bayesian methods for the reinforcement learning (RL) paradigm, and a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties. Expand
Policy Gradients with Variance Related Risk Criteria
TLDR
A framework for local policy gradient style algorithms for reinforcement learning for variance related criteria for policy gradient algorithms for criteria that involve both the expected cost and the variance of the cost. Expand
Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach
TLDR
This paper shows that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget, and presents an approximate value-iteration algorithm forCVaR MDPs and analyzes its convergence rate. Expand
Learning Plannable Representations with Causal InfoGAN
TLDR
This work asks how to imagine goal-directed visual plans – a plausible sequence of observations that transition a dynamical system from its current configuration to a desired goal state, which can later be used as a reference trajectory for control. Expand
A Deep Reinforcement Learning Perspective on Internet Congestion Control
TLDR
It is shown that casting congestion control as RL enables training deep network policies that capture intricate patterns in data traffic and network conditions, and leverage this to outperform the state-of-the-art. Expand
Optimizing the CVaR via Sampling
TLDR
A novel sampling-based estimator for the gradient of the CVaR, in the spirit of the likelihood-ratio method is proposed, and the bias of the estimator is analyzed, and it is proved the convergence of a corresponding stochastic gradient descent algorithm to a localCVaR optimum. Expand
...
1
2
3
4
5
...