Towards Learning Multi-Agent Negotiations via Self-Play

  title={Towards Learning Multi-Agent Negotiations via Self-Play},
  author={Yichuan Tang},
  journal={2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)},
  • Yichuan Tang
  • Published 1 October 2019
  • Computer Science
  • 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)
Making sophisticated, robust, and safe sequential decisions is at the heart of intelligent systems. This is especially critical for planning in complex multi-agent environments, where agents need to anticipate other agents' intentions and possible future actions. Traditional methods formulate the problem as a Markov Decision Process, but the solutions often rely on various assumptions and become brittle when presented with corner cases. In contrast, deep reinforcement learning (Deep RL) has… 
Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning
A new multi-agent policy gradient method, called Robust Local Advantage (ROLA) Actor-Critic, that allows each agent to learn an individual action-value function as a local critic as well as ameliorating environment non-stationarity via a novel centralized training approach based on a centralized critic.
Learning to Robustly Negotiate Bi-Directional Lane Usage in High-Conflict Driving Scenarios
This paper introduces a previously unconsidered, yet everyday, high-conflict driving scenario requiring negotiations between agents of equal rights and priorities, and proposes Discrete Asymmetric Soft Actor-Critic (DASAC), a maximum- entropy off-policy MARL algorithm allowing for centralized training with decentralized execution.
Thompson sampling for Markov games with piecewise stationary opponent policies
An algorithm (TSMG) is proposed for general-sum Markov games against agents that switch between several stationary policies, combining change detection with Thompson sampling to learn parametric models of these policies.
A Survey of Deep RL and IL for Autonomous Driving Policy Learning
This is the first survey to focus on AD policy learning using DRL/DIL, which is addressed simultaneously from the system, task-driven and problem-driven perspectives.
Mono-Video Deep Adaptive Cruise Control in the Image Space via Behavior Cloning
  • Maxim Dolgov, T. Michalke
  • Computer Science
    2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)
  • 2020
This work designs an ACC system that relies on quantities that can be directly extracted from the image data namely time to collision and the size of the target object detection, and learns a deep neural network target acceleration controller by copying the behavior of a classical controller.
The robot consciousness based on empirical knowledge
This paper proposes the robot consciousness based on empirical knowledge, believing that the empirical knowledge of robots is an important basis for robot consciousness, any cognitive experience of robot can lead to the generation of consciousness.
Contingencies from Observations: Tractable Contingency Planning with Learned Behavior Models
A general-purpose contingency planner that is learned end-to-end using high-dimensional scene observations and low-dimensional behavioral observations, and it is shown how this model can tractably learn contingencies from behavioral observations.
Learning to Simulate Self-Driven Particles System with Coordinated Policy Optimization
A novel MARL method called Coordinated Policy Optimization (CoPO), which incorporates social psychology principle to learn neural controller for SDP, is developed, which can achieve superior performance compared to MARL baselines in various metrics.
MIDAS: Multi-agent Interaction-aware Decision-making with Adaptive Strategies for Urban Autonomous Navigation
MIDAS uses an attention mechanism to handle an arbitrary number of other agents and includes a "driver-type" parameter to learn a single policy that works across different planning objectives and is safer and more efficient than existing approaches to interaction-aware decision-making.
Reward (Mis)design for Autonomous Driving
To diagnose common errors, 8 simple sanity checks are developed for identifying flaws in reward functions from past work on reinforcement learning for autonomous driving (AD), revealing near-universal flaws in Reward design for AD that might also exist pervasively across reward design for other tasks.


A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning
An algorithm is described, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection, which generalizes previous ones such as InRL.
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability.
Emergent Complexity via Multi-Agent Competition
This work introduces several competitive multi-agent environments where agents compete in a 3D world with simulated physics and points out that such environments come with a natural curriculum, because for any skill level, an environment full of agents of this level will have the right level of difficulty.
Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving
This paper applies deep reinforcement learning to the problem of forming long term driving strategies and shows how policy gradient iterations can be used without Markovian assumptions, and decomposes the problem into a composition of a Policy for Desires and trajectory planning with hard constraints.
Multi-agent Reinforcement Learning: An Overview
This chapter reviews a representative selection of multi-agent reinforcement learning algorithms for fully cooperative, fully competitive, and more general (neither cooperative nor competitive) tasks.
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation.
Model-free Deep Reinforcement Learning for Urban Autonomous Driving
This paper proposes a framework to enable model-free deep reinforcement learning in challenging urban autonomous driving scenarios, and designs a specific input representation and uses visual encoding to capture the low-dimensional latent states.
End-to-End Deep Reinforcement Learning for Lane Keeping Assist
The effect of some restricted conditions, put on the car during the learning phase, on the convergence time for finishing its learning phase is explained and the results demonstrate learning of autonomous maneuvering in a scenario of complex road curvatures and simple interaction with other vehicles.
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
This paper introduces the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge, and combines fictitious self-play with deep reinforcement learning.
Continuous control with deep reinforcement learning
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.