Share This Author
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
An adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination is presented.
Constrained Policy Optimization
Constrained Policy Optimization (CPO) is proposed, the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration, and allows us to train neural network policies for high-dimensional control while making guarantees about policy behavior all throughout training.
Value Iteration Networks
This work introduces the value iteration network (VIN), a fully differentiable neural network with a `planning module' embedded within that shows that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.
Model-Ensemble Trust-Region Policy Optimization
This paper analyzes the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and shows that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training.
Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach
This paper shows that a CVaR objective, besides capturing risk sensitivity, has an alternative interpretation as expected cost under worst-case modeling errors, for a given error budget, and presents an approximate value-iteration algorithm forCVaR MDPs and analyzes its convergence rate.
Bayesian Reinforcement Learning: A Survey
- M. Ghavamzadeh, Shie Mannor, Joelle Pineau, Aviv Tamar
- Computer ScienceFound. Trends Mach. Learn.
- 18 November 2015
An in-depth review of the role of Bayesian methods for the reinforcement learning (RL) paradigm, and a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.
A Deep Reinforcement Learning Perspective on Internet Congestion Control
- Nathan Jay, Noga H. Rotman, Brighten Godfrey, Michael Schapira, Aviv Tamar
- Computer ScienceICML
- 24 May 2019
It is shown that casting congestion control as RL enables training deep network policies that capture intricate patterns in data traffic and network conditions, and leverage this to outperform the state-of-the-art.
Policy Gradients with Variance Related Risk Criteria
A framework for local policy gradient style algorithms for reinforcement learning for variance related criteria for policy gradient algorithms for criteria that involve both the expected cost and the variance of the cost.
Learning Plannable Representations with Causal InfoGAN
- Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart J. Russell, P. Abbeel
- Computer ScienceNeurIPS
- 24 July 2018
This work asks how to imagine goal-directed visual plans – a plausible sequence of observations that transition a dynamical system from its current configuration to a desired goal state, which can later be used as a reference trajectory for control.
Optimizing the CVaR via Sampling
A novel sampling-based estimator for the gradient of the CVaR, in the spirit of the likelihood-ratio method is proposed, and the bias of the estimator is analyzed, and it is proved the convergence of a corresponding stochastic gradient descent algorithm to a localCVaR optimum.