Learn More
We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partiallyobservable Markov decision problems (DECPOMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multirobot coordination,(More)
In the domain of decentralized Markov decision processes, we develop the first complete and optimal algorithm that is able to extract deterministic policy vectors based on finite state controllers for a cooperative team of agents. Our algorithm applies to the discounted infinite horizon case and extends best-first search methods to the domain of(More)
Self-Organizing maps (SOM) have become popular for tasks in data visualization, pattern classification or natural language processing and can be seen as one of the major concepts for artificial neural networks of today. Their general idea is to approximate a high dimensional and previously unknown input distribution by a lower dimensional neural network(More)
We present a new algorithm for cooperative reinforcement learning in multiagent systems. Our main concern is the correct coordination between the members of the team: We seek to obtain an optimal solution for the team as a whole while keeping the learning as much decentralized as possible. We consider autonomous and independently learning agents that do not(More)
We present a new algorithm for cooperative reinforcement learning in multiagent systems. We consider autonomous and independently learning agents, and we seek to obtain an optimal solution for the team as a whole while keeping the learning as much decentralized as possible. Coordination between agents occurs through communication, namely the mutual(More)
RÉSUMÉ. Nous présentons un nouvel algorithme de planification pour la construction de systèmes multi-agents réactifs et situés pouvant se modéliser par des processus de décision de Markov décentralisés (DEC-POMDP). Cet algorithme est fondé sur la programmation dynamique à base de points. Il est dérivé de techniques de programmation dynamique optimale(More)
  • 1