Abstract
In a teacher-student framework, a more experienced agent (teacher) helps accelerate the learning of another agent (student) by suggesting actions to take in certain states. In cooperative multi-agent reinforcement learning (MARL), where agents must cooperate with one another, a student could fail to cooperate effectively with others even by following a teacher’s suggested actions, as the policies of all agents can change before convergence. When the number of times that agents communicate with one another is limited (i.e., there are budget constraints), an advising strategy that uses actions as advice could be less effective. We propose a partaker-sharer advising framework (PSAF) for cooperative MARL agents learning with budget constraints. In PSAF, each Q-learner can decide when to ask for and share its Q-values. We perform experiments in three typical multi-agent learning problems. The evaluation results indicate that the proposed PSAF approach outperforms existing advising methods under both constrained and unconstrained budgets. Moreover, we analyse the influence of advising actions and sharing Q-values on agent learning.
- Majid Nili Ahmadabadi and Masoud Asadpour. 2002. Expertness based cooperative Q-learning. In IEEE Trans. Syst. Man Cyber. Part B, Cyber. 32 (2002), 66--76.Google Scholar
Digital Library
- Hidehisa Akiyama. 2012. Agent2d base code. Retrieved from https://zh.osdn.net/projects/rctools/.Google Scholar
- Ofra Amir, Ece Kamar, Andrey Kolobov, and Barbara J. Grosz. 2016. Interactive teaching strategies for agent training. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 804--811.Google Scholar
- Roohollah Amiri, Hani Mehrpouyan, Lex Fridman, Ranjan K. Mallik, Arumugam Nallanathan, and David Matolak. 2018. A machine learning approach for power allocation in HetNets considering QoS. In Proceedings of the IEEE International Conference on Communications. 1--7.Google Scholar
Cross Ref
- Ana L. C. Bazzan, Denise de Oliveira, and Bruno Castro da Silva. 2010. Learning in groups of traffic signals. Eng. Applic. Artif. Intell. 23, 4 (2010), 560--568.Google Scholar
Digital Library
- Tim Brys, Ann Nowé, Daniel Kudenko, and Matthew E. Taylor. 2014. Combining multiple correlated reward and shaping signals by measuring confidence. In Proceedings of the 28th AAAI Conference on Artificial Intelligence. 1687--1693.Google Scholar
- Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the National Conference on Artificial Intelligence. 746--752.Google Scholar
- Jeffery A. Clouse. 1996. On Integrating Apprentice Learning and Reinforcement Learning. PhD thesis. University of Massachusetts.Google Scholar
- Bryan Cunningham and Yong Cao. 2012. Non-reciprocating sharing methods in cooperative Q-learning environments. In Proceedings of the IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology 2 (2012), 212--219.Google Scholar
Digital Library
- Felipe Leno da Silva and Anna Helena Reali Costa. 2019. A survey on transfer learning for multiagent reinforcement learning systems. J. Artif. Intell. Res. 64 (2019), 645--703.Google Scholar
Digital Library
- Felipe Leno da Silva, Ruben Glatt, and Anna Helena Reali Costa. 2017. Simultaneously learning and advising in multiagent reinforcement learning. In Proceedings of the 16th International Conference on Autonomous Agents and MultiAgent Systems. 1100--1108.Google Scholar
Digital Library
- Ethan Duryea, Michael Ganger, and Wei Hu. 2016. Exploring deep reinforcement learning with multi Q-learning. In Intell. Contr. Autom. 07 (2016), 129--144.Google Scholar
Cross Ref
- Anestis Fachantidis, Matthew E. Taylor, and Ioannis P. Vlahavas. 2019. Learning to teach reinforcement learning agents. Mach. Learn. Knowl. Extract. 1, 1 (2019), 21--42.Google Scholar
Cross Ref
- Javier García and Fernando Fernández-Rebollo. 2019. Probabilistic policy reuse for safe reinforcement learning. ACM Trans. Auton. Adapt. Syst. 13, 3 (2019), 14:1--14:24.Google Scholar
Digital Library
- Jayesh K. Gupta, Maxim Egorov, and Mykel J. Kochenderfer. 2017. Cooperative multi-agent control using deep reinforcement learning. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems. 66--83.Google Scholar
- Jianye Hao, Ho fung Leung, and Zhong Ming. 2015. Multiagent reinforcement social learning toward coordination in cooperative multiagent systems. ACM Trans. Auton. Adapt. Syst. 9, 4 (2015), 20:1--20:20.Google Scholar
Digital Library
- Matthew Hausknecht, Prannoy Mupparaju, Sandeep Subramanian, Shivaram Kalyanakrishnan, and Peter Stone. 2016. Half field offense: An environment for multiagent learning and ad hoc teamwork. In Proceedings of the AAMAS Adaptive Learning Agents (ALA) Workshop. Retrieved from http://www.cs.utexas.edu/users/ai-lab?hausknecht:aamasws16.Google Scholar
- Yann-Michaël De Hauwere, Peter Vrancx, and Ann Nowé. 2010. Learning multi-agent state space representations. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems 2 (2010), 715--722.Google Scholar
- Yann-Michaël De Hauwere, Peter Vrancx, and Ann Nowé. 2011. Solving sparse delayed coordination problems in multi-agent reinforcement learning. In Adaptive and Learning Agents. Springer Berlin, 114--133.Google Scholar
- Zhang-Wei Hong, Shih-Yang Su, Tzu-Yun Shann, Yi-Hsiang Chang, and Chun-Yi Lee. 2018. A deep policy inference Q-network for multi-agent systems. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. 1388--1396.Google Scholar
- Ercüment Ilhan, Jeremy Gow, and Diego Pérez-Liébana. 2019. Teaching on a budget in multi-agent deep reinforcement learning. In Proceedings of the IEEE Conference on Games. 1--8.Google Scholar
Digital Library
- Hiroaki Kitano, Minoru Asada, Yasuo Kuniyoshi, Itsuki Noda, Eiichi Osawa, and Hitoshi Matsubara. 1997. RoboCup: A challenge problem for AI. In Proceedings of the Robot Soccer World Cup I (RoboCup’97). 1--19.Google Scholar
- Jelle R. Kok and Nikos Vlassis. 2004. Sparse tabular multiagent Q-learning. In Proceedings of the Machine Learning Conference of Belgium and the Netherlands. 65--71.Google Scholar
- Guillaume Lample and Devendra Singh Chaplot. 2017. Playing FPS games with deep reinforcement learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2140--2146.Google Scholar
- Hoang Minh Le, Yisong Yue, Peter Carr, and Patrick Lucey. 2017. Coordinated multi-agent imitation learning. In Proceedings of the 34th International Conference on Machine Learning. 1995--2003.Google Scholar
- Andrei Marinescu, Ivana Dusparic, and Siobhán Clarke. 2017. Prediction-based multi-agent reinforcement learning in inherently non-stationary environments. ACM Trans. Auton. Adapt. Syst. 12, 2 (2017), 9:1--9:23.Google Scholar
Digital Library
- Laëtitia Matignon, Guillaume J. Laurent, and Nadine Le Fort-Piat. 2012. Independent reinforcement learners in cooperative Markov games: A survey regarding coordination problems. Knowl. Eng. Rev. 27, 1 (2012), 1--31.Google Scholar
Digital Library
- Frans A. Oliehoek and Christopher Amato. 2016. A Concise Introduction to Decentralized POMDPs. (SpringerBriefs in Intelligent Systems).Google Scholar
Digital Library
- Shayegan Omidshafiei, Dong-Ki Kim, Miao Liu, Gerald Tesauro, Matthew Riemer, Christopher Amato, Murray Campbell, and Jonathan P. How. 2019. Learning to teach in cooperative multiagent reinforcement learning. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence. 6128--6136.Google Scholar
- Ariel Rosenfeld, Matthew E. Taylor, and Sarit Kraus. 2017. Speeding up tabular reinforcement learning using state-action similarities. In Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems. 1722--1724.Google Scholar
- Alexander A. Sherstov and Peter Stone. 2005. Function approximation via tile coding: Automating parameter choice. In Proceedings of the Symposium on Abstraction, Reformulation, and Approximation (SARA’05). 194--205.Google Scholar
- Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction (1st. ed.). The MIT Press, Cambridge, MA.Google Scholar
Digital Library
- Ming Tan. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the 10th International Conference on Machine Learning. 330--337.Google Scholar
Cross Ref
- Adam Taylor, Ivana Dusparic, Maxime Guériau, and Siobhán Clarke. 2019. Parallel transfer learning in multi-agent systems: What, when and how to transfer? In International Joint Conference on Neural Networks (IJCNN). 1--8.Google Scholar
Cross Ref
- Adam Taylor, Ivana Dusparic, Edgar Galván-López, Siobhán Clarke, and Vinny Cahill. 2014. Accelerating learning in multi-objective systems through transfer learning. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’14). 2298--2305.Google Scholar
Cross Ref
- Matthew E. Taylor, Peter Stone, and Yaxin Liu. 2007. Transfer learning via inter-task mappings for temporal difference learning.J. Mach. Learn. Res. 8 (2007), 2125--2167.Google Scholar
Digital Library
- Lisa Torrey and Matthew E. Taylor. 2012. Help an agent out: Student/teacher learning in sequential decision tasks. In Proceedings of the Adaptive and Learning Agents Workshop (ALA’12). 41--48.Google Scholar
- Lisa Torrey and Matthew E. Taylor. 2013. Teaching on a budget: Agents advising agents in reinforcement learning. In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems. 1053--1060.Google Scholar
- Hado van Hasselt. 2010. Double Q-learning. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2613--2621.Google Scholar
- Christopher J. C. H. Watkins and Peter Dayan. 1992. Technical note: Q-learning. Mach. Learn. 8 (1992), 279--292.Google Scholar
Digital Library
- Chongjie Zhang, Sherief Abdallah, and Victor R. Lesser. 2008. Efficient multi-agent reinforcement learning through automated supervision. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS’08). 1365--1370.Google Scholar
- Zongzhang Zhang, Zhiyuan Pan, and Mykel J. Kochenderfer. 2017. Weighted double Q-learning. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3455--3461.Google Scholar
- Matthieu Zimmer, Paolo Viappiani, and Paul Weng. 2014. Teacher-student framework: A reinforcement learning approach. In Proceedings of the AAMAS Workshop on Autonomous Robots & Multirobot Systems.Google Scholar
Index Terms
A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget Constraint
Recommendations
Mediated Multi-Agent Reinforcement Learning
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent SystemsThe majority of Multi-Agent Reinforcement Learning (MARL) literature equates the cooperation of self-interested agents in mixed environments to the problem of social welfare maximization, allowing agents to arbitrarily share rewards and private ...
A multi-agent reinforcement learning with weighted experience sharing
ICIC'11: Proceedings of the 7th international conference on Advanced Intelligent Computing Theories and Applications: with aspects of artificial intelligenceReinforcement Learning, also sometimes called learning by rewards and punishments is the problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment [1]. With repeated trials however, it is expected ...
Deep reinforcement learning for multi-agent interaction
Multi-agent systems research in the United KingdomThe development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel ...






Comments