Combining Policy Search with Planning in Multi-agent Cooperation

@inproceedings{Ma2008CombiningPS,
  title={Combining Policy Search with Planning in Multi-agent Cooperation},
  author={Jie Ma and Stephen Cameron},
  booktitle={RoboCup},
  year={2008}
}
It is cooperation that essentially differentiates multi-agent systems (MASs) from single-agent intelligence. [] Key Method We propose a novel method called Policy Search Planning (PSP), in which Policy Search is used to find an optimal policy for selecting plans from a plan pool. PSP extends an existing gradient-search method (GPOMDP) to a MAS domain. We demonstrate how PSP can be used in RoboCup Simulation, and our experimental results reveal robustness, adaptivity, and outperformance over other methods.
From Semantics to Execution: Integrating Action Planning With Reinforcement Learning for Robotic Causal Problem-Solving
TLDR
The paper demonstrates how the reward-sparsity can serve as a bridge between the high-level and low-level state- and action spaces and demonstrate that the integrated method is able to solve robotic tasks that involve non-trivial causal dependencies under noisy conditions, exploiting both data and knowledge.
From semantics to execution: Integrating action planning with reinforcement learning for robotic tool use
TLDR
It is demonstrated that the integrated neuro-symbolic method is able to solve object manipulation problems that involve tool use and non-trivial causal dependencies under noisy conditions, exploiting both data and knowledge.
Cooperative and Distributed Reinforcement Learning of Drones for Field Coverage
TLDR
This paper can serve as an efficient framework for using MARL to enable UAV team to work in an environment where its model is unavailable and accounting for the stochastic aspect of the problem and technical aspects of a real-world deployment, where uncertainties, such as wind and other dynamics of the environment presents.
Local Patch AutoAugment with Multi-Agent Collaboration
TLDR
This paper proposes a more fine-grained automated DA approach, dubbed Patch AutoAugment, to divide an image into a grid of patches and search for the joint optimal augmentation policies for the patches, which outperforms the state-of-theart DA methods while requiring fewer computational resources.
Reinforcement learning for robot soccer
TLDR
Several variants of the general batch learning framework are discussed, particularly tailored to the use of multilayer perceptrons to approximate value functions over continuous state spaces, which are successfully used to learn crucial skills in soccer-playing robots participating in the RoboCup competitions.
OxBlue2009 (2D) Team Description
TLDR
OxBlue2009 (2D) is a robot football team for RoboCup 2D simulation and the decision structure of this team will be presented and the fundamental cooperative behaviour formation is briefly presented.
Patch AutoAugment
TLDR
A patch-level automatic DA algorithm called Patch AutoAugment (PAA), which allows each patch DA operation to be controlled by an agent and models it as a Multi-Agent Reinforcement Learning (MARL) problem.
On the Convergence of Competitive, Multi-Agent Gradient-Based Learning
TLDR
A general framework for competitive gradient-based learning is introduced that encompasses a wide breadth of learning algorithms including policy gradient reinforcement learning, gradient based bandits, and certain online convex optimization algorithms.
Design Distributed Control and Learning Algorithms for a Team of UAVs for Optimal Field Coverage
  • H. Pham
  • Computer Science, Environmental Science
  • 2018
TLDR
This thesis proposed a decentralized control algorithm for a team of UAVs that can autonomously and actively track the fire spreading boundary in a distributed manner and utilized a model-free learning algorithm to solve the problem of optimal coverage for a static field of arbitrary shape.
Designing Deception in Adversarial Reinforcement Learning
TLDR
This paper illustrates deception as a complementary policy to direct objective satisfaction and generates a library of tactics to deal with them seamlessly without having to customise the function approximator or learning algorithm to a great deal.
...
...

References

SHOWING 1-10 OF 32 REFERENCES
Combining Planning with Reinforcement Learning for Multi-robot Task Allocation
TLDR
It is demonstrated that this dynamic scheduler can learn not only to allocate robots to tasks efficiently, but also to position the robots appropriately in readiness for new tasks, and conserve resources over the long run.
Flexible Coordination of Multiagent Team Behavior Using HTN Planning
TLDR
This work introduces an approach using Hierarchical Task Network planning in each of the agents for high-level coordination and description of team strategies that facilitates the maintenance of expert knowledge specified as team strategies separated from the agent implementation.
Shaping multi-agent systems with gradient reinforcement learning
TLDR
This work designs simple reactive agents in a decentralized way as independent learners and develops an incremental learning algorithm where agents face a sequence of progressively more complex tasks.
Using a Planner for Coordination of Multiagent Team Behavior
TLDR
This work presents an approach to coordinate the behavior of a multiagent team using an HTN planning procedure, using planners in each of the agents to incorporate domain knowledge and to make agents follow a specified team strategy.
Policy-Gradient Methods for Planning
TLDR
The application of reinforcement learning — in the form of a policy-gradient method — to large domains that are infeasible for dynamic programming, including concurrent durative tasks, multiple uncertain outcomes, and limited resources is demonstrated.
Planning and Scheduling Ingredients for a Multi-Agent System
TLDR
The result, although preliminary, shows how such planning and scheduling components can contribute an interesting coordination service for realistic applications in a multi-agent setting.
FF + FPG: Guiding a Policy-Gradient Planner
TLDR
This paper shows how to use an external teacher to guide FPG's exploration using the actions suggested by FF's heuristic (Hoffmann 2001), as FF-replan has proved efficient for probabilistic re-planning.
Combining Reinforcement Learning with Symbolic Planning
TLDR
This paper proposes a method, PLANQ-learning, that couples a Q-learner with a STRIPS planner, and shows significant improvements in scaling-up behaviour as the state-space grows larger, compared to both standard Q- learning and hierarchical Q-learning methods.
The factored policy-gradient planner
The Factored Policy Gradient planner ( IPC-06 Version )
TLDR
The Factored Policy Gradient planner is presented: a probabilistic temporal planner designed to scale to large planning domains by applying two significant approximations and factored into a per action mapping from a partial observation to the probabilility of executing, reflecting how desirable each action is.
...
...