dm_control: Software and Tasks for Continuous Control
- Yuval Tassa, S. Tunyasuvunakool, N. Heess
- Computer ScienceSoftw. Impacts
- 22 June 2020
Improved Image Captioning via Policy Gradient optimization of SPIDEr
- Siqi Liu, Zhenhai Zhu, Ning Ye, S. Guadarrama, K. Murphy
- Computer ScienceIEEE International Conference on Computer Vision
- 1 December 2016
This paper shows how to use a policy gradient (PG) method to directly optimize a linear combination of SPICE and CIDEr (a combination the authors call SPIDEr), which results in image captions that are strongly preferred by human raters compared to captions generated by the same model but trained to optimize MLE or the COCO metrics.
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
- H. F. Song, A. Abdolmaleki, M. Botvinick
- Computer ScienceInternational Conference on Learning…
- 26 September 2019
V-MPO is introduced, an on-policy adaptation of Maximum a Posteriori Policy Optimization that performs policy iteration based on a learned state-value function and does so reliably without importance weighting, entropy regularization, or population-based tuning of hyperparameters.
A Generalized Training Approach for Multiagent Learning
- Paul Muller, Shayegan Omidshafiei, R. Munos
- Computer ScienceInternational Conference on Learning…
- 27 September 2019
This paper extends the theoretical underpinnings of PSRO by considering an alternative solution concept, $\alpha$-Rank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and applies readily to general-sum, many-player settings, and establishes convergence guarantees in several games classes.
Emergent Coordination Through Competition
- Siqi Liu, Guy Lever, J. Merel, S. Tunyasuvunakool, N. Heess, T. Graepel
- EducationInternational Conference on Learning…
- 19 February 2019
This study demonstrates that the automatic optimization of simple shaping rewards, not themselves conducive to co-operative behavior, can lead to long-horizon team behavior in large scale multi-agent training in continuous control.
Observational Learning by Reinforcement Learning
- Diana Borsa, Bilal Piot, Siqi Liu, R. Munos, O. Pietquin
- Computer ScienceAdaptive Agents and Multi-Agent Systems
- 20 June 2017
It is argued that observational learning can emerge from pure Reinforcement Learning (RL), potentially coupled with memory, and that an RL agent can leverage the information provided by the observations of an other agent performing a task in a shared environment.
Hierarchical visuomotor control of humanoids
- J. Merel, Arun Ahuja, Greg Wayne
- Computer ScienceInternational Conference on Learning…
- 27 September 2018
An architecture capable of surprisingly flexible, task-directed motor control of a relatively high-DoF humanoid body is developed by combining pre-training of low-level motor controllers with a high-level,task-focused controller that switches among low- level sub-policies.
From Motor Control to Team Play in Simulated Humanoid Football
This work optimized teams of agents to play simulated football via reinforcement learning, constraining the solution space to that of plausible movements learned using human motion capture data, resulting in a team of coordinated humanoid football players that exhibit complex behavior at different scales, quantified by a range of analysis and statistics.
The Body is Not a Given: Joint Agent Policy Learning and Morphology Evolution
- D. Banarse, Yoram Bachrach, T. Graepel
- Computer ScienceAdaptive Agents and Multi-Agent Systems
- 8 May 2019
This work proposes a method for uncovering strong agents, consisting of a good combination of a body and policy, based on combining RL with an evolutionary procedure, and uses the Shapley value from cooperative game theory to find the fair contribution of individual components, taking into account synergies between components.
Pick Your Battles: Interaction Graphs as Population-Level Objectives for Strategic Diversity
- M. Garnelo, Wojciech M. Czarnecki, D. Balduzzi
- Computer ScienceAdaptive Agents and Multi-Agent Systems
- 8 October 2021
This paper provides evidence for the importance of diversity in multi-agent training and the effect of applying different interaction graphs on the training trajectories, diversity and performance of populations in a range of games is analyzed.
...
...