Abstract
Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, as well as local information. Prevalent solution techniques are centralized and model based—limitations that we address by distributed reinforcement learning (RL). We particularly favor alternate learning, where agents alternately learn best responses to each other, which appears to outperform concurrent RL. However, alternate learning requires an initial policy. We propose two principled approaches to generating informed initial policies: a naive approach that lays the foundation for a more sophisticated approach. We empirically demonstrate that the refined approach produces near-optimal solutions in many challenging benchmark settings, staking a claim to being an efficient (and realistic) approximate solver in its own right. Furthermore, alternate best response learning seeded with such policies quickly learns high-quality policies as well.
- Christopher Amato, Jilles Steeve Dibangoye, and Shlomo Zilberstein. 2009. Incremental policy generation for finite-horizon DEC-POMDPs. In Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS'09). 2--9.Google Scholar
- Christopher Amato and Shlomo Zilberstein. 2009. Achieving goals in decentralized POMDPs. In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems. 593--600. Google Scholar
Digital Library
- Bikramjit Banerjee. 2013. Pruning for Monte Carlo distributed reinforcement learning in decentralized POMDPs. In Proceedings of the 27th AAAI Conference on Artificial Intelligence. 88--94.Google Scholar
- Bikramjit Banerjee, Jeremy Lyle, Landon Kraemer, and Rajesh Yellamraju. 2012. Sample bounded distributed reinforcement learning for decentralized POMDPs. In Proceedings of the 26th AAAI Conference on Artificial Intelligence. 1256--1262.Google Scholar
- Daniel S. Bernstein, Robert Givan, Neil Immerman, and Shlomo Zilberstein. 2002. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research 27, 4, 819--840. Google Scholar
Digital Library
- Lucian Busoniu. 2010. MARL Toolbox Ver. 1.3. Retrieved November 3, 2014, from http://busoniu.net/repository.php.Google Scholar
- Lonnie Chrisman. 1992. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proceedings of the 10th National Conference on Artificial Intelligence. 183--188. Google Scholar
Digital Library
- Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th National Conference on Artificial Intelligence. 746--752. Google Scholar
Digital Library
- Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. 1998. Planning and acting in partially observable stochastic domains. Artifical Intelligence 101, 99--134. Google Scholar
Digital Library
- Jelle R. Kok, Pieter Jan't Hoen, Bram Bakker, and Nikos A. Vlassis. 2005. Utile coordination: Learning interdependencies among cooperative agents. In Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG'05). 29--36.Google Scholar
- Andrew Kachites McCallum. 1995. Reinforcement Learning with Selective Perception and Hidden State. Ph.D. Dissertation. Department of Computer Science, University of Rochester, Rochester, NY.Google Scholar
- Francisco S. Melo and Manuela M. Veloso. 2011. Decentralized MDPs with sparse interactions. Artificial Intelligence 175, 11, 1757--1789. http://dblp.uni-trier.de/db/journals/ai/ai175.html#MeloV11. Google Scholar
Digital Library
- Nicolas Meuleau, Leonid Peshkin, Kee-Eung Kim, and Leslie Pack Kaelbling. 1999. Learning finite-state controllers for partially observable environments. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI'99). 427--436. Google Scholar
Digital Library
- Ranjit Nair, Milind Tambe, Makoto Yokoo, David Pynadath, and Stacy Marsella. 2003. Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI'03). 705--711. Google Scholar
Digital Library
- Sven Seuken and Shlomo Zilberstein. 2007. Improved memory-bounded dynamic programming for decentralized POMDPs. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI'07). 344--351.Google Scholar
- Guy Shani, Ronen I. Brafman, and Solomon E. Shimony. 2005. Model-based online learning of POMDPs. In Machine Learning: ECML 2005. Lecture Notes in Computer Science, Vol. 3720. Springer, 353--364. Google Scholar
Digital Library
- Edward J. Sondik. 1971. The Optimal Control of Partially Observable Markov Decision Processes. Ph.D. Dissertation. Stanford University, Stanford, CA.Google Scholar
- Matthijs T. J. Spaan. 2013. Dec-POMDP Problem Domains. Retrieved November 3, 2014, from http://masplan.org/problem_domains.Google Scholar
- Matthijs T. J. Spaan, Frans A. Oliehoek, and Christopher Amato. 2011. Scaling up optimal heuristic search in Dec-POMDPs via incremental expansion. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI'11). 2027--2032. Google Scholar
Digital Library
- Richard Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA. Google Scholar
Digital Library
- Pradeep Varakantham, Jun Young Kwak, Matthew E. Taylor, Janusz Marecki, Paul Scerri, and Milind Tambe. 2009. Exploiting coordination locales in distributed POMDPs via social model shaping. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS'09). http://dblp.uni-trier.de/db/conf/aips/icaps2009.html#VarakanthamKTMST09.Google Scholar
- Nikos Vlassis. 2003. A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence. Morgan and Claypool Publishers. Google Scholar
Digital Library
- Chris Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 3, 279--292. Google Scholar
Digital Library
- Feng Wu, Shlomo Zilberstein, and Xiaoping Chen. 2010. Rollout sampling policy iteration for decentralized POMDPs. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI'10). 666--673.Google Scholar
- H. Peyton Young. 2004. Strategic Learning and Its Limits. Oxford University Press.Google Scholar
- Chongjie Zhang and Victor Lesser. 2011. Coordinated multi-agent reinforcement learning in networked distributed POMDPs. In Proceedings of the 25th AAI Conference on Artificial Intelligence (AAAI'11). 764--770.Google Scholar
Recommendations
Learning Cooperative Behaviours in Multiagent Reinforcement Learning
ICONIP '09: Proceedings of the 16th International Conference on Neural Information Processing: Part IWe investigated the coordination among agents in a goal finding task in a partially observable environment. In our problem formulation, the task was to locate a goal in a 2D space. However, no information related to the goal was given to the agents ...
Learning complementary multiagent behaviors: a case study
AAMAS '09: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2As the reach of multiagent reinforcement learning extends to increasingly complex tasks, it is likely that the diverse challenges encountered can only be surmounted by combining the strengths of different learning methods. We consider this aspect of ...
A survey and critique of multiagent deep reinforcement learning
AbstractDeep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered ...






Comments