Decision-Making Under Uncertainty in Multi-Agent and Multi-Robot Systems: Planning and Learning

@inproceedings{Amato2018DecisionMakingUU,
  title={Decision-Making Under Uncertainty in Multi-Agent and Multi-Robot Systems: Planning and Learning},
  author={Chris Amato},
  booktitle={IJCAI},
  year={2018}
}
  • Chris Amato
  • Published in IJCAI 1 July 2018
  • Computer Science
Multi-agent planning and learning methods are becoming increasingly important in today's interconnected world. Methods for real-world domains, such as robotics, must consider uncertainty and limited communication in order to generate high-quality, robust solutions. This paper discusses our work on developing principled models to represent these problems and planning and learning methods that can scale to realistic multi-agent and multi-robot tasks. 
On Intelligent Decision Making in Multiagent Systems in Conditions of Uncertainty
TLDR
An intelligent information technology is proposed for integrating declarative languages using the example of Prolog with the NetLogo multiagent simulation environment, which is characterized by using the capabilities of the logic output mechanism of the Prolog language.
Context-Aware Deep Q-Network for Decentralized Cooperative Reconnaissance by a Robotic Swarm
This paper addresses the problem of decentralized cooperation in a robotic swarm. The aim is to perform target search and destroy operation in an unknown/uncertain environment, without any
Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification
TLDR
A simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), which combines the first-order policy gradients and zeroth-order optimization methods to better optimize the conservative value functions over the actor parameters.
PLAN BETTER AMID CONSERVATISM : OFFLINE MULTI-AGENT REINFORCEMENT LEARNING WITH ACTOR RECTIFICATION
TLDR
This work proposes a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge via an effective combination of first-order policy gradient and zeroth-order optimization methods for the actor to better optimize the conservative value function.
Attention-Based Fault-Tolerant Approach for Multi-Agent Reinforcement Learning Systems
TLDR
An Attention-based Fault-Tolerant (FT-Attn) algorithm which selects correct and relevant information for each agent at every time-step and enables the agents to learn effective communication policies through experience concurrently to the action policies.
Hybrid-order Network Consensus for Distributed Multi-agent Systems
TLDR
This paper proposes a Motif-Aware Weighted Multi-agent System (MWMS) method for consensus control which focuses more on triangle motif in the network, but it can be extended to other kinds of motifs as well and shows that the hybrid higher-order network can effectively enhance the consensus of the multi-agent system in different network topologies.
Game Method of Event Synchronization in Multi-agent Systems
TLDR
The parameter influences on convergence of a game method are investigated by means of a computer experiment that allows to study the dependence of the training time on the stochastic game of agents from the basic parameters of the algorithm and permits to assert that partial compensation of uncertainty is ensured by the agent ability to self-learning and adaptive decision-making strategies.
Aspects of Mechanism Design for Industry 4.0 Multi-Robot Task Auctioning
TLDR
This paper studies the effect of varying bidding auction mechanism design protocols to result in multi-agent task auctioning and demonstrates that effective mechanisms can lead to fair outcomes despite erroneous or biased bids for tasks from agents.
Adaptive Strategies in the Multi-agent “Predator-Prey” Models
TLDR
The authors propose to use the stochastic game methods obtained for this purpose to develop the game recurrent method and algorithm of formation of the coordinated agent strategies in the course of minimization of average loss functions.
Emergence of Coordination in Growing Decision-Making Organizations: The Role of Complexity, Search Strategy, and Cost of Effort
TLDR
Results support the conjecture that increasing complexity leads to more hierarchical coordination and reveal that the cost of effort for implementing new solutions in conjunction with the search strategy may remarkably affect the emerging form of coordination.
...
...

References

SHOWING 1-10 OF 41 REFERENCES
Policy search for multi-robot coordination under uncertainty
TLDR
A principled method based on a general model of multi-robot cooperative planning in the presence of stochasticity, uncertain sensing, and communication limitations is introduced and can solve significantly larger problems than existing MacDec-POMDP planners.
Planning for decentralized control of multiple robots under uncertainty
TLDR
This paper uses three variants of a warehouse task to show that a single planner of this type can generate cooperative behavior using task allocation, direct communication, and signaling, as appropriate, and demonstrates that the algorithmic framework can automatically optimize control and communication policies for complex multi-robot systems.
Learning for multi-robot cooperation in partially observable stochastic environments with macro-actions
TLDR
This work proposes an iterative sampling based Expectation-Maximization algorithm (iSEM) to learn polices using only trajectory data containing observations, MAs, and rewards and shows the algorithm is able to achieve better solution quality than the state-of-the-art learning-based methods.
A Concise Introduction to Decentralized POMDPs
This book introduces multiagent planning under uncertainty as formalized by decentralized partially observable Markov decision processes (Dec-POMDPs). The intended audience is researchers and
Decentralized control of multi-robot partially observable Markov decision processes using belief space macro-actions
TLDR
Algorithms for solving Dec-POSMDPs, which are more scalable than previous methods since they can incorporate closed-loop belief space macro-actions in planning and automatically constructed to produce robust solutions, are presented.
Decentralized control of Partially Observable Markov Decision Processes using belief space macro-actions
TLDR
The proposed Dec-POSMDP formulation allows asynchronous decision-making by the robots, which is crucial in multi-robot domains, and an algorithm for solving this Dec- POSMDP which is much more scalable than previous methods since it can incorporate closed-loop belief space macro-actions in planning.
Learning for Decentralized Control of Multiagent Systems in Large, Partially-Observable Stochastic Environments
TLDR
A policy-based reinforcement learning approach, which learns the agent policies based solely on trajectories generated by previous interaction with the environment, which is able to generate valid macro-action controllers and develop an expectationmaximization (EM) algorithm (called Policy-based EM or PoEM), which has convergence guarantees for batch learning.
Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams
TLDR
This article focuses on decentralized reinforcement learning (RL) in cooperative MAS, where a team of independent learning robots (IL) try to coordinate their individual behavior to reach a coherent joint behavior, and suggests a Q-learning extension for ILs, called hysteretic Q- learning.
Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability
TLDR
A decentralized single-task learning approach that is robust to concurrent interactions of teammates is introduced, and an approach for distilling single- task policies into a unified policy that performs well across multiple related tasks, without explicit provision of task identity is presented.
Policy Gradient With Value Function Approximation For Collective Multiagent Planning
TLDR
This work shows how a particular decomposition of the approximate action-value function over agents leads to effective updates, and derives a new way to train the critic based on local reward signals, and shows that this new AC approach provides better quality solutions than previous best approaches.
...
...